Professional Documents
Culture Documents
H313 A 1003-6105 2010 03-0249-09
2001
1 152
O
E
2
1
Kilgarriff
1996 2001
H0 w
X Y
w
O
E 2
2 P
P O E
H0
H0 H1
2
2 788
2 995 w
X Y
21 X Y w
*
250
6
n1 n2
n1 R
n1
Kilgarriff 2001
Church Gale 1995 167 LOB Brown
30 5 733
23 3 418 975
Mann-Whitney
Wilcoxon Conover 2006 195
Larson Farber
2003 559
3
31
1
2
R
R 3 32
10 Brown 100
500 2000
4
CLEC Chinese Learners
English Corpus
5 2003 100
252
30
0 4 567
4 567
token
1 013
319 1 071 878 2 000
2002 3 CLEC 1
070 602 0
10 50
100 000 500
10
CLEC 4 567
Brown 20 10
CLEC
2 000 2 100
4
528 41
Brown
100 500
1 015 421
33 z
Kilgarriff 2001246 n1 n2 10 4 5 6
30 R
30 1
5 1
0 4 567
2
95 975 99 995
2 384 502 664 788
z 196 224 258 281
R 790713093 753613464 709313907 678214218
253
95 975 99 995
3526 7720 3383 7407 3194 6994 3062 6705
3526 7720 3388 7418 3207 7022 3077 6737
2892 6332 2605 5704 2220 4861 1888 4134
2
3 3
95 975 99 995
2890 2604 2219 1888
636 779 975 1174
0 5 13 15
2 1 1 0
3
95
dictionary
Brown 57 CLEC 67 contain Kilgarriff 2001 235-37
Brown 45 CLEC 29 BNC
975 99
dictionary
995
42
254
2 580 232
Rayson et al
2004 825 90
999 9999
999
96
2 812
1083 4
z 2055 7781
13219
30 128004 56621 362114 109 1855 894 1388
30 2630 12697 20903 114 42 557 661
30 1580 10265 13829 116 357 225 06
4
5 Brown
5 3
4 5
255
1 2
1 fake 10 1452 commodities 21 1301
2 whole 309 407 next 394 291
3 to 26158 30443 for 9489 8737
4 feed 122 31 television 50 221
5 smokers 1 29 carbon 30 1
6 hearts 23 43 mention 50 26
5 6 Brown
feed 23 26 television 36 31 smokers
1 6 carbon 7 1
500
10
4
6
Brown
hearts 20 35
mention 45 24
2002 160
2 3
1
10 000 1 0009999 100999 5099 3049
2
40 33999 12299 1119
256
Conover 2006
Conover W J 2006
M
Davison A C 2008 Statistical Models M
Cambridge Cambridge University Press
the of De Cock S 2000 Repetitive phrasal chunkiness and
advanced EFL speech and writing A In C
Mair M Hundt eds Corpus Linguistics and
assumptions and the conditions of lexical differences the chi-square test which is the main
statistical technique for lexical differences between corpora is likely to produce statistical errors
when applied to this kind of tasks Therefore in this study other statistical techniques including
the log-likelihood ratio test and the rank sum test are also applied to lexical differences between
corpora As the analysis indicates the log-likelihood ratio test is experimentally similar to the
chi-square test in examining lexical differences between corpora they both tend to cause
statistical errors due to such factors as sample size and sample representativeness The rank sum
test however can solve some of the relevant problems and obtain relatively objective statistical
results