You are on page 1of 7

123 No.

123
2008 9 CAFLEC Sep. 2008

3 3

( COCA )

1 2 3
,M ark D avies ,
( 1. , 400030; 2. B righam Young University, USA;
3. , 310036 )

: M ark Davies COCA


3. 6 , 1990 2007 18 ,

: COCA ;
: H319. 3 : B : 1001 25795 ( 2008 ) 05 20027 20007

: ( Size )
( Speed ) ( Annotation )
1 (Davies, 2005: 301) COCA

COCA (Corpus of Contemporary A 2 3. 6 ;



merican English ) ( http: / /www. americancorpus. org/ ) , ,
B righam Young University M ark Davies 3. 6 108GB , 8 2. 4GHz ,
, ;
, CLAW S7 COCA
, 18 ( 1990 2007
, ) , 2000 ,
COCA 2008 2 20
25 140
AACL 22008 ( American A ssociation for Corpus L inguis2 COCA

tics) , M ark Davies

,
COCA , ,
, ( Table 1 ) Table 2
Table 3
, ,
COCA , , ,

Table 1

( SPOK) ( F IC) (MAG) (NEW S) (ACAD ) 1990 - 1994 1995 - 1999 2000 - 2004 2005 - 2007

( ) 76. 6 69. 6 78. 1 73. 4 73. 0 103. 1 102. 3 102. 9 62. 4

: : , , , B righam Young University :


M ark Davies: , B righam Young University :
: , , : ,
: 2008 202 218, : 2008 203 217
2008 207 230
: , !

72

1994-2009 China Academic Journal Electronic Publishing House. All rights reserved. http://www.cnki.net
, : ( COCA )

Table 2 COCA

( types) ( lemm as) Hapaxes


370, 691, 937 147, 093 2, 297, 689 2, 436, 450 1, 455, 301 22, 136, 258

Table 3 ( 42 )
2
(9 ) (11 )
SPOK: ABC 13, 132, 214 MAG: Afric 2Am er 3, 357, 201 COCA ( Figure 1 )
SPOK: CBS 10, 702, 932 MAG: Children 2, 228, 071 1 (A ) , ( The Corpus of Con2
SPOK: CNN 18, 579, 309 MAG: Entertain 3, 479, 537 temporary American English (COCA ) ) (M ark Davies/
SPOK: FOX 3, 883, 546 MAG: Financial 5, 311, 157 B righam Young University) 3 ,
SPOK: Indep 4, 513, 182 MAG:W om en /M en 5, 858, 827 ( B ) ,
SPOK:M SNBC 809, 105 MAG: Hom e /Health 12, 963, 244 (C) (D ) ,
SPOK: NBC 4, 009, 442 MAG: New s/Op in 15, 843, 590 Figure 1 laugh. [ n 3 ]

SPOK: NPR 15, 315, 470 MAG: Religion 3, 009, 200 (B ) , ( D )
SPOK: PBS 5, 623, 129 MAG: Sci/ Tech 9, 881, 440 (C)
(9 ) MAG: Soc /A rts 6, 696, 799 2. 1 ( B)
ACAD: Education 6, 409, 519 MAG: Sports 9, 440, 869
, :
ACAD: Geog / SocSci 12, 928, 300 (8 ) (D ISPALY)
( SEARCH STR IN G)

ACAD: H istory 10, 367, 436 NEW S: Editorial 4, 063, 608
( CL ICK TO SEE OP2
( SECTION )
ACAD: Hum anities 9, 629, 214 NEW S: L ife 12, 883, 821
TIONS)
ACAD: L aw / PolSci 7, 887, 439 NEW S:M isc 24, 691, 477
2. 1. 1 (D ISPALY)
ACAD: M edicine 4, 512, 889 NEW S:Money 6, 295, 632
( D ISPALY) ( CHART)

ACAD: M isc 3, 358, 303 NEW S: New s_ Intl 3, 731, 400
(L IST) ( COM PARE WORDS)
ACAD: Phil/Rel 5, 902, 797 NEW S: New s_Local 5, 237, 152
( SECTIONS ) NO ,
ACAD: Sci/ Tech 12, 087, 448 NEW S: New s_Natl 5, 318, 754
( SORT) ( FREQUENCY) , ( C )
NEW S: Sports 11, 162, 901

( CHART) ,
(5 ) F IC: Juvenile 2, 794, 394
,
F IC: Gen (Book) 14, 266, 742 F IC: Movies 9, 208, 596

F IC: Gen (J rnl) 28, 894, 677 F IC: SciFi/ Fant 14, 456, 959
(CHART) , (WORD ( S) )
,

(L IST) ,

Figure 1 laugh. [ n 3 ]

82

1994-2009 China Academic Journal Electronic Publishing House. All rights reserved. http://www.cnki.net
, : ( COCA )

Figure 2 [ tall]
( COM PARE WORDS) (CONTEXT) ,
, CONTEXT , ( [ 2 ] )
(D ISPLAY ) ( SORT ) ( [ 3 ] [ 4 ] ) ,
RELEVANCE ( ) SORT ( (WORD ( S) )
) 3 , FREQUENCY ( ( [ 3 ] ) CONTEXT
) , RELEVANCE ( ) , ALPHABETICAL (WORD ( S) ) ,
( ) (RELEVANCE) ( [ 4 ] ) CONTEXT
, , , 4
( emp ty words) 9 0
(RELEVANCE) CONTEXT (WORD ( S) )
, (M utual , 0
Information ) (M I)
2. 1. 2. 3 ( POS L IST)
( ) , ( POS L IST) ( [ 5 ] )
, ( RELEVANCE) , POS L IST
, rob steal ,
rob steal , ;
FREQUENCY ( ) , ,
, ( POS L IST) 39 ,
ALPHABETICAL ( )
2. 1. 2. 4 (USER L ISTS)
( SECTIONS) YES , ( SEARCH STR IN G)

, (USER L ISTS)
Figure 2 L IST SECTIONS YES , C
[ tall] C ,
2. 1. 2 ( SEARCH STR IN G) ,
( SEARCH STR IN G) (WORD
( S) )
( CONTEXT)
( POS L IST) 2. 1. 3 ( SECTION )
(USER L ISTS) ( SECTION ) ,

( Genre) ( Year) , ,

IGNORE ,
Figure 3 ( SEARCH STR IN G) ,
2. 1. 2. 1 (WORD ( S) ) Ctrl
(WORD ( S) ) ( [ 1 ] ) ; CHART C SEE ALL
, , SECTIONS
M IN FREQ
9 , 9 , ,
2. 1. 2. 2 ( CONTEXT) , 10

92

1994-2009 China Academic Journal Electronic Publishing House. All rights reserved. http://www.cnki.net
, : ( COCA )

2. 2 ( C )
2. 1. 4 (CL ICK TO SEE OPTIONS) (B ) L IST 100
( CL ICK TO SEE OPTIONS) ( ) laugh. [ n 3 ] ( CONTEXT 3 ,
, : GROUP BY ( ) D ISPLAY ( 5, SORT RELEVANCE, Sec2
) SAVE L ISTS ( ) #H ITS ( ) tion 1 M IN FREQ 10)
H ID E OPTIONS Table 4 laugh. [ n 3 ]
5
2. 1. 4. 1 ( GROUP BY)
CONTEXT TOT ALL % MI
( GROUP BY ) 5 : LEMMAS 1 M IRTHL ESS 21 59 35. 59 7. 68
WORDS
NONEBOTH WORDSBOTH LEMMAS, 2 THROA TY 45 327 13. 76 6. 73
WORDS LEMMAS , 3 RUEFUL 19 278 6. 83 6. 03

4 CACKL IN G 12 176 6. 82 6. 03
; WORDS ; NONE 5 HEARTY 94 1448 6. 49 5. 98

,
CLAW S ; CONTEXT laugh 5
( ) , BOTH WORDS BOTH LEM 2 ,
MAS , , , 5 TOT laugh
5 ( m irthless)
2. 1. 4. 2 (D ISPLAY) ; ALL ( m irthless)
(D ISPLAY) 4 : RAW FREQ ( ; % TOT ALL ; M I(M utual Information)
) PER /M IL ( ) RAW FREQ + ( m irthless laugh ( ) ,
)
PER /M IL + ( ) CONTEXT
C , TOT , Figure 1 D (
, B ) CONTEXT ,
D ISPLAY CHART SECTIONS YES CONTEXT, D
2. 1. 4. 3 ( SAVE L ISTS) ( #H ITS) , D
( SAVE L ISTS) ( C ) ( B ) CHART
( USER L ISTS) , Figure 5
( #H ITS) 100, 100 B D ISPLAY COM PARE WORDS
C , C COM PARE WORDS
, 1000 rob. [ v3 ]
steal. [ v3 ] , CONTEXT

03

1994-2009 China Academic Journal Electronic Publishing House. All rights reserved. http://www.cnki.net
, : ( COCA )

Table 5 ( WO RD ( S) )


jum bo soft landing (9 )

/ ; L IST ; SHOW SECT ION S


borrow / lend

fairly 3 fairly 3 , ; : fairly 3

un 3 ly un ly unlikely, unusually; 3

[ slip ]. [ v3 ] slip slippedslipslipp ing slip s

3 heart3 heart heart heart

r n run, running, ran ; ? :


r? n 3
,

[ sing ] sing sing, singing, sang , song

= , publish, circulate, announce :


[ = publish ] publish
COCA

[ = publish ] publish announce, circulated, publishes

[ = knock ] the door the door knock slam , hit, crack, pound, bash :
thick [ nn 3 ] thick PO S L IST

un 3 ed. [ j3 ] un ed united, unexpected, unp recedented, unidentified,

dis3 . [ v? d ] dis discovered, disappeared, discussed

dis3 [ v? d ] dis , ( ) district had, disease had

3 ly. [ j3 ] ly ly

3 ly. [ r3 ] 3 ly. [ j3 ] ly ly highly unlikely, environm entally friendly, potentially deadly

[ xx ] 3 w ithout not/ nt + +w ithout , 3 [ v3 ]

s ,
it s [ j3 ] that is s

it is [ v 3 ] that we [ vv 3 ] CHART , 8. 5 ;

that

to [ v3 ] or not to [ v3 ] to be or not to be
[ v3 ] 5 : bedobuytell engage

fool you into thinking, talked him into going, trick peop le
[ vv3 ] 3 into [ v? g ] into V 2ing
into thinking

clean the
[ = clean ]. [ v3 ] the [ n 3 ] w iped the sweat, mopp ing the floor

: ,

[ nn3 ] , 0 3
rob steal 3 2. 3 ( D )
,

( C ) ,
, 3 SECTION D , 29
, M IN FREQ 10, M IN 4 ,


FREQ 4, 10 , ,
4 , ,


,
,
0 ,
( Figure 4) rob banks 112 120 ,
(W 1 ) , steal banks 5 (W 2 ) , W 1 KW IC
0. 96 ( PERC - 1) , PERC - 1 rob rob steal
3
( 50% ) 1. 91 ( SCORE )

,

13

1994-2009 China Academic Journal Electronic Publishing House. All rights reserved. http://www.cnki.net
, : ( COCA )

COCA 3 ] , CONTEXT ( [ 2 ] ) [ n 3 ] , ( [ 3 ]
Google
COCA [ 4 ] ) 5, SEARCH
, , chip 5 , 5
(WORD ( S) ) Google SEARCH chocolate, computer, chip, cookies Intel
, L IST ( 2180 ) Table 6,
D ISPLAY ( ) CHART ( )
, Google 3. 3
( Figure 5 ) 4 COCA
: , ( PER toilet
M IL ) , , , [ v3 ] 3 toilet
,
Google Google use flush [ SORT ] REL 2
google EVANCE
SEE ALL SECTIONS , Google distinguish. [ v 3 ] ( CHART )
, distinguish 6 ,
Google WORD ( S) [ j3 ] action ,
Google SECTION 1 F IC, SECTION 2 ACAD ,
, ( SECTION ) action ,
10
3. 1


, terrorism

3 terrorism
terrorism (Davies, 2008)
, [ v 3 ] terrorism
?
,
terrorism
, , [ v3 ] [ HELP. . . ] D
3 terrorism
terrorism More information,
, 3
POS L IST RESET ,
COCA 39 , RESET
,
4
http: / / ucrel. lancs. ac. uk /
claws7 tags. htm l, COCA 4. 1
CLAW S7 , ,
137
[ ] , ,
3. 2 ,
Figure 3 WORD ( S) ( [ 1 ] ) chip. [ n , ,
Table 6

WORD ( S) [ 1 ] CON TEXT [ 2 ] [3 ] /[4] SORT Function

study. [ n 3 ] [ j3 ] 1 /0 RELEVANCE study , p resent, recent

w ith . [ v3 ] 1 /0 FREQU ENCY w ith :

[ = beautiful] [ = flower] 5 /5 BOTH WORD S beautiful flower

sm all little [ nn 3 ] 0 /3 RELEVANCE sm all little 3


ground. [ n 3 ] floor. [ n 3 ] [ j3 ] 3 /0 RELEVANCE ground floor 3

statesm an politician [ j3 ] 2 /0 FREQU ENCY statesm an politician 2 ,

de 3 . [ vvi3 ] SECT ION 1 = ACAD SECT ION 2 = F IC de

[ = sm art] SECTION 1 = NEW S SECT ION 2 = F IC sm art

23

1994-2009 China Academic Journal Electronic Publishing House. All rights reserved. http://www.cnki.net
, : ( COCA )

B ( SORT) , C
, COCA , Concordance

, , one one another
lemmas, [ borrow ] [ lend ] p ron. PERS ( ) p ron. INDF ( )
COCA , 13248 44
, COCA , CLAW S7 COCA
M ark Davies COCA ,
BNC Tim e Corpus ( ht2 brown. [ np 3 ] brown. [ j3 ]
tp: / / corpus. byu. edu / ) , Google Yahoo , ,
,
COCA ( )

: R ; W ordcruncher
; WordSm ith COCA [ 1 ] Davies, M ark. The 360 m illion word Corpus of Contemporary
, M ark Davies American English ( 1990 - 2007 ) [ A ]. Unpublished manu2
(Davies 2007; Davies 2008) COCA scrip t. Paper p resented at American A ssociation for Corpus

L inguistics, 2008.
, [ 2 ] Davies, M ark. ( forthcom ing) Relational databases as a ro2
COCA bust architecture for the analysis of word frequency[ A ]. In
, AHRC ICT M ethods Network: Expert Sem inar on L inguis2
tics: Word Frequency and Keyword Extraction [ C ] , ed.
4. 2 Dawn A rcher. A shgate.
COCA [ 3 ] Davies, M ark. Semantically2based queries w ith a joint BNC /
( user2friendly) WordNet database [ A ]. In Corpus L inguistics tw enty2five
SEARCH STR IN G CONTEXT POS L IST CL ICK TO years on, ed. Roberta Facchinetti. Am sterdam: Rodop i,
SEE OPTIONS , 2007: 149 - 167.
[ 4 ] Davies, M ark. The advantage of using relational databases
COCA BNC for large corpora: speed, advanced queries, and unlim ited
(BNC 70 ) , D Con2 annotation[ J ]. International Journal of Corpus L inguistics,
cordance WordSm ith , 2005, 10.

A Good Platform for English Teachers and Learners:


the Corpus of Contemporary American English ( COCA )
WAN G X ing 2fu , M a rk D avies , L IU Guo2hu i
1 2 3

( 1. College of Foreign Languages, Chongqing University, Chongqing 400030, China;


2. B righam Young University, Provo, U tah, USA 84602;
3. College of Foreign Languages, Hangzhou Normal University, Hangzhou, Zhejiang 310036, China )
Abstract: The article is an introduction to the Corpus of Contemporary Am erican English ( COCA ) that is created by
Prof. M ark Davies. This corpus contains 360 m illion words of the materials published in America from 1990 to 2007,
and it is the largest balanced English corpus. The Corpus of Contemporary Am erican English is free for researchers and
English learners to use online.
Key words: Corpus of Contemporary American English; B alanced English Corpus

R , University of California, Santa Barbara www. r - p roject. org/.

33

1994-2009 China Academic Journal Electronic Publishing House. All rights reserved. http://www.cnki.net

You might also like