You are on page 1of 30

2011 9 25-53

Sketch EngineSKE

NTU Chinese-English Parallel Corpora

SKE t MI

t MI

2011 11 5 2011 7 9 2011 7 11

E-mail: FB061@mail.oit.tw

Compilation and Translation Review


Vol. 4, No. 2 (September 2011), 25-53

Corpus-Based Analysis for English-Chinese Translation of Passives in VOA News


Chien-hui Hung
When translating English into Chinese, translators are often unconsciously influenced by English syntax, which often results in the westernization of translated Chinese. This study addresses this issue by employing
scientific methods to look at the English-Chinese translation of passives in
Voice of America News. Based on the Sketch Engine Traditional Chinese
Corpus and the NTU Chinese-English Parallel Corpora, we examine
the quantitative characteristics of westernized Chinese in terms of word
frequency and two statistical measures for collocationsMutual Information and t score. Corpus tools such as concordance and word sketch
in Sketch Engine were employed to observe and analyze collocation patterns in authentic Chinese. The goal is to help translators avoid using nonidiomatic and inauthentic Chinese and to advocate the use of both monolingual and bilingual corpora as important resources for improving the
quality of translation, as well as the use of statistics to present the norm of
the language in translation studies.
Keywords: CTS, passive construction, corpus, word frequency, collocability, norm
Received: November 5, 2010; Revised: July 9, 2011; Accepted: July 11,
2011

Chien-hui Hung, Assistant Professor, General Education Center, Oriental Institute of


Technology, E-mail: FB061@mail.oit.edu.tw

27


He has been considered a front-runner in the contest.

2000
2

translationese Europeanized
3

Chinese Anglicized Chinese 1993

20071997

1993

2002

2003 26

28

2006
6


2004

interlanguage

2000

2009

410

2002 20012003 2004 20002001


1997 1997 1997 2006
2007 1995
1993

Olohan, 2004

Kucera Francis 1967


Brown Corpus 2002 corpora/corpus

Biber, Conrad, & Reppen,


1998; Okeeffe, McCarthy, & Carter, 2007corpus

29

linguistcs

Biber1993BiberConrad Reppen1998
McEneryXiao Tono2006 Mona Baker
1990
CTSBaker1993, 1995, 1996, 2000

Xiao2010
passive constructionsXiao Baker
9

Xiao

normalisation/conservatism
target language
Xiao

Xiao

2010

Google
Google

30

Google 196
Google Kilgarriff,
2007
10

norm

target language
source language
11

12

NTU Chinese-English Parallel Corpora

NTU Chinese-English Parallel


Corpora Gao, 2011
source language 14085307
7312431 7275332
16056353
13

80299 157410

31

Sketch Engine(SKE)

14
15

Sketch EngineSKE Kilgariff, Rychl, Smr, & Tugwell, 2004 Lexical Computing
Ltd
SKE
16

word frequency
collocation
concordance

1.

size

(1) NormalizationBiber, Conrad, & Reppen,


1998

32

12,000
1018,00012
1,000,000
(10/12000)*1000000=830
(12/18000)*1000000=660

830660

(2) Difference Coefficient


Kennedy, 1998

A B A
B
1-1 A
A

17

0.5-0.5

18

(3) Log-LikelihoodLL Biber, 1993; Rayson & Garside,


2000
LL A B
LL

LL LL
LL 3.8
p 0.05LL
LL Log-Likelihood Calculator

33

LL
McEnery, Xiao, & Tono, 2006
2.

19

collocability
t MI
Church & Hanks, 1990; Church, Gale, Hanks, &
Hindle, 1991
MI t
(1) Mutual InformationMI

I ( x, y ) = log2

P ( x, y )
P( x ) P( y )

P(x)
P(y)
P(x,y)
I(x,y)MI
MI MI
MI
MI

MI
MI McEnery, Xiao, & Tono,

34

2006
(2) t t-scores

20

f ( x) f ( y)
N
f ( x, y)

f ( x, y)

f(x)
f(y)
f(x,y)
N
t
t 1.645
p 0.05
MI t
t
t MI
MI

MI t

20012003
concordance

35

Sketch Engine
word sketch

1995
1990

1
1
Sketch Engine

706428333

157410

404

0.257%

859559

0.122%

245

0.156%

245305

0.035%

117

0.074%

13521

0.002%

63

0.040%

163583

0.023%

58

0.037%

3192

0.0005%

55

0.035%

231614

0.033%

46

0.029%

156111

0.022%

38

0.024%

412343

0.058%

24

0.015%

10864

0.002%

17

0.011%

6344

0.0009%

0.001%

1295

0.0002%

36

1 NTU Chinese-English Parallel Corpora


Sketch Engine

Sketch Engine 7 1

404Sketch Engine
8595590.257%
0.122%
Sketch Engine
412343
163583 245305 156111
231614
245
63554638
2
2 SKE
SKE

404

859559

245

61%

245305

29%

63

16%

163583

19%

55

14%

231614

27%

46

11%

156111

18%

38

9%

412343

48%

Sketch Engine

37


48% 404
389%

11%
18%;14%27%;
16%19%

141
25%29%

BE +

SKE

3
SKE

38

Sketch Engine

157410

706428333

404

859559

( )

2566

1217

0.36

Log-Likelihood (LL)

178.16

3 SKE
SKE2566
1217
2566 SKE 1217
SKE
SKE 0.36
LL 178.16
LL 3.18

11659
50 SKE
287910711287
t

39

53.656MI 8.279 t 35.873MI


5.053 t 32.723MI 3.510
t 1.645
MI

SKE
4
4 SKE

12003
11477
7710
2879
1403
1287
1071
1037
877
630
403
357
279
209
196
39
63
10
8

t
109.458
107.126
87.771
53.656
37.455
35.873
32.723
32.201
29.612
25.098
20.074
18.893
16.703
14.457
14
6.244
7.928
3.162
2.828

MI
3.930
8.235
4.478
8.279
5.369
5.053
3.510
4.573
3.834
6.693
3.640
2.295
7.099
4.173
3.576
-1.95
-0.051
5.204
-3.710

40

4 Sketch
Enginet MI
t 1.645 4

t
MI

MI
t 2004 136

21

Ten people are confirmed dead.


10
Sketch Engine 157
20
50

SKE

41

He has been considered a front-runner in the contest.

SKE 12003 4
Sketch Engine
634991
1193
More bodies are
found

Many have been released.


SKE

SKE
17137
MI 6.15
MI 5.136

42

1403 MI 5.369

SKE
10125 MI 14.136

Elder statesman Shimon Peres was named a deputy


prime minister.

SKE

SKE

The course was being cleared.

Sketch Engine

More than 57-hundred people in 11 Indian states have


been affected.
11 5 7
4

43

8771052
MI 5.905 MI 3.834
11 5 7

It was hardly noticed.


:
SKE
39 MI
180 MI
2.064

It might be eaten also.

SKE
MI
unaccusative construction

SKE

44

MI

SKE
LL
SKE

be

unaccusative construction

SKE

45

SKE

MI 0
Sketch Engine

SKE
SKE

SKE

t MI

22

n-gram

concor-

46

23

24

dance word sketch

norm

1. NTU Chinese-English Parallel Corpora


2. translationese Nida1959
3.

2005
4. 1995

5.

2002 61
6. 2006 224
7. Selinker1972interlanguageIL
8. A corpus is a principled collection of texts. A corpus
is a collection of electronic texts usually stored on a computer. A corpus is available for
qualitative and quantitative analysisOkeeffe, McCarthy, & Carter, 2007, p. 1-2
9. simplificationexplicitation
normalisation/conservatismlevelling out
2005
10. Hatim normThe conventions (in the sense of
implicitly agreed-upon standards) of acceptable content and rhetorical organization (2001,
p. 231)

47

11. Second
Language Acquisition, SLAForeign Language Teaching, FLT
Computer Learner Corpus, CLC

Granger, 2004
12.
13. http://www.voafanti.com/gate/big5/www.voanews.com/chinese/
14.
15. http://www.sketchengine.co.uk/
16. SKE SKE SKE

17.
18.Chi-squaredX2Test
Chi-squared L L Chi-squared Test
Rayson, Berridge, & Francis, 2004 LL

19. 2002 155


20. Z Hunston, 2002Z t

21. NTU Chinese-English Parallel Corpora


22. 2005n-gram
23.
24.

2009
406-416
2002
2005

1995
1990

48

2001
2003
1993
1993

1997
2005
9161-196
2004
133-144
2000
2001
2006

2007
C. N. Li & S. A. Thompson

2002
20103
2163-202
1997
1997
8.1- 8.10
2005 9 405426
Baker, M. (1993). Corpus linguistics and translation studies: Implications and applications. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and technology: In honor of John Sinclair (pp. 233-250). Amsterdam & Philadelphia:
John Benjamins.
Baker, M. (1995). Corpora in translation studies: An overview and some suggestions for future research. Target, 7(2), 223-243.
Baker, M. (1996). Corpus-based translation studies: The challenges that lie ahead.
In H. Somers (Ed.), Terminology, LSP and translation (pp. 175-186). Amsterdam & Philadelphia: John Benjamins.
Baker, M. (2000). Towards a methodology for investigating the style of a literary
translator. Target, 12(2), 241-266.
Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8(4), 243-257.

49

Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press.
Church, K.W., & Hanks, P. (1990). Word association norms, mutual information,
and lexicography. Computational Linguistics, 16, 22-29.
Church, K.W., Gale, W. A., Hanks, P., & Hindle, D. (1991).Using statistics in
lexical analysis. In U. Zernik (Ed.), Lexical acquisition: Using on-line resources
to build a lexicon (pp. 115-164). Hillsdale, NJ: Lawrence Erlbaum Associates.
Gao, Zhao-Ming. (2011). Exploring the effects and use of a Chinese-English parallel concordancer. Computer-Assisted Language Learning, 24(3), 255-275.
Granger, S. (2004). Computer learner corpus research: Current status and future
prospects. Applied corpus linguistics: A multidimensional perspective (pp. 123145). Amstedam & Atlanta: Rodopi.
Hatim, B. (2001). Teaching and researching translation. Harlow: Pearson Education
Limited.
Hunston, S. (2002). Corpora in applied linguistics. England: Cambridge University
Press.
Kennedy, G. (1998). An introduction to corpus linguistics. London: Longman.
Kilgarriff, A., Rychl, P., Smr, P., & Tugwell, D. (2004). The Sketch Engine.
Proceedings of Euralex (pp. 105-116). Lorient, France: publisher.
Kilgarriff, A. (2007). Googleology is bad science. Computational Linguistics, 33(1),
145-151.
McEnery, A., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced resource book. London and New York: Routledge.
Nida, E. A. (1959). Principles of translation as exemplified by Bible translating. In
R. A. Brower (Ed.), On translation (pp. 11-31). Cambridge: Harvard University Press.
Olohan, M. (2004). Introducing corpora in translation studies. London and New
York: Routledge.
Okeeffe, A., McCarthy, M., & Carter, R. (2007). From corpus to classroom: Language use and language teaching. Cambridge: Cambridge University Press.
Rayson, P., & Garside, R. (2000). Comparing corpora using frequency profiling.
In Proceedings of the workshop on Comparing Corpora, held in conjunction
with The 38th Annual Meeting of the Association for Computational Linguistics,
1-6.
Rayson P., Berridge D., & Francis B. (2004). Extending the Cochran rule for the
comparison of word frequencies between corpora. In Volume II of Purnelle

50

G., Fairon C., & Dister A. (Eds.) Le poids des mots: Proceedings of the 7th International Conference on Statistical Analysis of Textual Data, Louvain-la-Neuve,
Belgium, March 10-12, 2004, Presses universitaires de Louvain, 926-936.
Selinker, L. (1972). Interlanguage. IRAL, 10(3), 209-231.
Xiao, R. (2010). How different is translated Chinese from native Chinese? A
corpus-based study of translation universals. International Journal of Corpus
Linguistics, 15(1), 5-35.

51

NTU Chinese-English Parallel Corpora

NTU Chinese-English Parallel CorporaVOA:

52

Sketch Engine (SKE)

SKEConcordance

53

SKEWord Sketch

54

You might also like