Professional Documents
Culture Documents
Sketch EngineSKE
SKE t MI
t MI
E-mail: FB061@mail.oit.tw
27
He has been considered a front-runner in the contest.
2000
2
translationese Europeanized
3
20071997
1993
2002
2003 26
28
2006
6
2004
interlanguage
2000
2009
410
Olohan, 2004
29
linguistcs
Biber1993BiberConrad Reppen1998
McEneryXiao Tono2006 Mona Baker
1990
CTSBaker1993, 1995, 1996, 2000
Xiao2010
passive constructionsXiao Baker
9
Xiao
normalisation/conservatism
target language
Xiao
Xiao
2010
Google
Google
30
Google 196
Google Kilgarriff,
2007
10
norm
target language
source language
11
12
80299 157410
31
Sketch Engine(SKE)
14
15
Sketch EngineSKE Kilgariff, Rychl, Smr, & Tugwell, 2004 Lexical Computing
Ltd
SKE
16
word frequency
collocation
concordance
1.
size
32
12,000
1018,00012
1,000,000
(10/12000)*1000000=830
(12/18000)*1000000=660
830660
A B A
B
1-1 A
A
17
0.5-0.5
18
LL LL
LL 3.8
p 0.05LL
LL Log-Likelihood Calculator
33
LL
McEnery, Xiao, & Tono, 2006
2.
19
collocability
t MI
Church & Hanks, 1990; Church, Gale, Hanks, &
Hindle, 1991
MI t
(1) Mutual InformationMI
I ( x, y ) = log2
P ( x, y )
P( x ) P( y )
P(x)
P(y)
P(x,y)
I(x,y)MI
MI MI
MI
MI
MI
MI McEnery, Xiao, & Tono,
34
2006
(2) t t-scores
20
f ( x) f ( y)
N
f ( x, y)
f ( x, y)
f(x)
f(y)
f(x,y)
N
t
t 1.645
p 0.05
MI t
t
t MI
MI
MI t
20012003
concordance
35
Sketch Engine
word sketch
1995
1990
1
1
Sketch Engine
706428333
157410
404
0.257%
859559
0.122%
245
0.156%
245305
0.035%
117
0.074%
13521
0.002%
63
0.040%
163583
0.023%
58
0.037%
3192
0.0005%
55
0.035%
231614
0.033%
46
0.029%
156111
0.022%
38
0.024%
412343
0.058%
24
0.015%
10864
0.002%
17
0.011%
6344
0.0009%
0.001%
1295
0.0002%
36
Sketch Engine 7 1
404Sketch Engine
8595590.257%
0.122%
Sketch Engine
412343
163583 245305 156111
231614
245
63554638
2
2 SKE
SKE
404
859559
245
61%
245305
29%
63
16%
163583
19%
55
14%
231614
27%
46
11%
156111
18%
38
9%
412343
48%
Sketch Engine
37
48% 404
389%
11%
18%;14%27%;
16%19%
141
25%29%
BE +
SKE
3
SKE
38
Sketch Engine
157410
706428333
404
859559
( )
2566
1217
0.36
Log-Likelihood (LL)
178.16
3 SKE
SKE2566
1217
2566 SKE 1217
SKE
SKE 0.36
LL 178.16
LL 3.18
11659
50 SKE
287910711287
t
39
SKE
4
4 SKE
12003
11477
7710
2879
1403
1287
1071
1037
877
630
403
357
279
209
196
39
63
10
8
t
109.458
107.126
87.771
53.656
37.455
35.873
32.723
32.201
29.612
25.098
20.074
18.893
16.703
14.457
14
6.244
7.928
3.162
2.828
MI
3.930
8.235
4.478
8.279
5.369
5.053
3.510
4.573
3.834
6.693
3.640
2.295
7.099
4.173
3.576
-1.95
-0.051
5.204
-3.710
40
4 Sketch
Enginet MI
t 1.645 4
t
MI
MI
t 2004 136
21
SKE
41
SKE 12003 4
Sketch Engine
634991
1193
More bodies are
found
SKE
SKE
17137
MI 6.15
MI 5.136
42
1403 MI 5.369
SKE
10125 MI 14.136
SKE
SKE
Sketch Engine
43
8771052
MI 5.905 MI 3.834
11 5 7
SKE
MI
unaccusative construction
SKE
44
MI
SKE
LL
SKE
be
unaccusative construction
SKE
45
SKE
MI 0
Sketch Engine
SKE
SKE
SKE
t MI
22
n-gram
concor-
46
23
24
norm
2005
4. 1995
5.
2002 61
6. 2006 224
7. Selinker1972interlanguageIL
8. A corpus is a principled collection of texts. A corpus
is a collection of electronic texts usually stored on a computer. A corpus is available for
qualitative and quantitative analysisOkeeffe, McCarthy, & Carter, 2007, p. 1-2
9. simplificationexplicitation
normalisation/conservatismlevelling out
2005
10. Hatim normThe conventions (in the sense of
implicitly agreed-upon standards) of acceptable content and rhetorical organization (2001,
p. 231)
47
11. Second
Language Acquisition, SLAForeign Language Teaching, FLT
Computer Learner Corpus, CLC
Granger, 2004
12.
13. http://www.voafanti.com/gate/big5/www.voanews.com/chinese/
14.
15. http://www.sketchengine.co.uk/
16. SKE SKE SKE
17.
18.Chi-squaredX2Test
Chi-squared L L Chi-squared Test
Rayson, Berridge, & Francis, 2004 LL
2009
406-416
2002
2005
1995
1990
48
2001
2003
1993
1993
1997
2005
9161-196
2004
133-144
2000
2001
2006
2007
C. N. Li & S. A. Thompson
2002
20103
2163-202
1997
1997
8.1- 8.10
2005 9 405426
Baker, M. (1993). Corpus linguistics and translation studies: Implications and applications. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and technology: In honor of John Sinclair (pp. 233-250). Amsterdam & Philadelphia:
John Benjamins.
Baker, M. (1995). Corpora in translation studies: An overview and some suggestions for future research. Target, 7(2), 223-243.
Baker, M. (1996). Corpus-based translation studies: The challenges that lie ahead.
In H. Somers (Ed.), Terminology, LSP and translation (pp. 175-186). Amsterdam & Philadelphia: John Benjamins.
Baker, M. (2000). Towards a methodology for investigating the style of a literary
translator. Target, 12(2), 241-266.
Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8(4), 243-257.
49
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press.
Church, K.W., & Hanks, P. (1990). Word association norms, mutual information,
and lexicography. Computational Linguistics, 16, 22-29.
Church, K.W., Gale, W. A., Hanks, P., & Hindle, D. (1991).Using statistics in
lexical analysis. In U. Zernik (Ed.), Lexical acquisition: Using on-line resources
to build a lexicon (pp. 115-164). Hillsdale, NJ: Lawrence Erlbaum Associates.
Gao, Zhao-Ming. (2011). Exploring the effects and use of a Chinese-English parallel concordancer. Computer-Assisted Language Learning, 24(3), 255-275.
Granger, S. (2004). Computer learner corpus research: Current status and future
prospects. Applied corpus linguistics: A multidimensional perspective (pp. 123145). Amstedam & Atlanta: Rodopi.
Hatim, B. (2001). Teaching and researching translation. Harlow: Pearson Education
Limited.
Hunston, S. (2002). Corpora in applied linguistics. England: Cambridge University
Press.
Kennedy, G. (1998). An introduction to corpus linguistics. London: Longman.
Kilgarriff, A., Rychl, P., Smr, P., & Tugwell, D. (2004). The Sketch Engine.
Proceedings of Euralex (pp. 105-116). Lorient, France: publisher.
Kilgarriff, A. (2007). Googleology is bad science. Computational Linguistics, 33(1),
145-151.
McEnery, A., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced resource book. London and New York: Routledge.
Nida, E. A. (1959). Principles of translation as exemplified by Bible translating. In
R. A. Brower (Ed.), On translation (pp. 11-31). Cambridge: Harvard University Press.
Olohan, M. (2004). Introducing corpora in translation studies. London and New
York: Routledge.
Okeeffe, A., McCarthy, M., & Carter, R. (2007). From corpus to classroom: Language use and language teaching. Cambridge: Cambridge University Press.
Rayson, P., & Garside, R. (2000). Comparing corpora using frequency profiling.
In Proceedings of the workshop on Comparing Corpora, held in conjunction
with The 38th Annual Meeting of the Association for Computational Linguistics,
1-6.
Rayson P., Berridge D., & Francis B. (2004). Extending the Cochran rule for the
comparison of word frequencies between corpora. In Volume II of Purnelle
50
G., Fairon C., & Dister A. (Eds.) Le poids des mots: Proceedings of the 7th International Conference on Statistical Analysis of Textual Data, Louvain-la-Neuve,
Belgium, March 10-12, 2004, Presses universitaires de Louvain, 926-936.
Selinker, L. (1972). Interlanguage. IRAL, 10(3), 209-231.
Xiao, R. (2010). How different is translated Chinese from native Chinese? A
corpus-based study of translation universals. International Journal of Corpus
Linguistics, 15(1), 5-35.
51
52
SKEConcordance
53
SKEWord Sketch
54