Professional Documents
Culture Documents
ISCOL 2014
University of Haifa, 7 September 2014
Introduction
O RIGINAL OR T RANSLATION ?
E XAMPLE (O OR T?)
E XAMPLE (T OR O?)
ISCOL 2014
2/1
Introduction
Translationese
T RANSLATIONESE
T HE LANGUAGE OF TRANSLATED TEXTS
ISCOL 2014
3/1
Introduction
Translationese
T RANSLATIONESE
T HE LANGUAGE OF TRANSLATED TEXTS
ISCOL 2014
4/1
Introduction
ISCOL 2014
5/1
Introduction
Methodology
M ETHODOLOGY
Corpus-based approach
Text classification with machine-learning techniques
Feature design
Evaluation
ISCOL 2014
6/1
Introduction
Methodology
I DENTIFYING T RANSLATIONESE
U SING TEXT CLASSIFICATION
ISCOL 2014
7/1
Research Contributions
R ESEARCH C ONTRIBUTIONS
Understanding the features of translationese; testing Translation
Studies hypotheses (Volansky et al., Forthcoming; Avner et al.,
Forthcoming)
Robust classification of translationese (Twitto-Shmuel et al.,
Forthcoming)
Language models for statistical machine translation
(Lembersky et al., 2011, 2012b)
Translation models for statistical machine translation
(Kurokawa et al., 2009; Lembersky et al., 2012a, 2013)
Automatic detection of machine translated texts (Aharoni et al.,
2014)
Identifying the first language of non-native writers (Tsvetkov et al.,
2013)
Shuly Wintner (University of Haifa)
ISCOL 2014
8/1
Methodology
T HE F EATURES OF T RANSLATIONESE
ISCOL 2014
9/1
Hypotheses
H YPOTHESES
ISCOL 2014
10 / 1
Features
ISCOL 2014
11 / 1
Features
F EATURES
ISCOL 2014
12 / 1
Results
Category
Sanity
Feature
Token unigrams
Token bigrams
Accuracy (%)
100
100
ISCOL 2014
13 / 1
Results
R ESULTS : S IMPLIFICATION
Category
Simplification
Feature
TTR (1)
TTR (2)
TTR (3)
Mean word rank (1)
Mean word rank (2)
N most frequent words
Mean word length
Syllable ratio
Lexical density
Mean sentence length
Accuracy (%)
72
72
76
69
77
64
66
61
53
65
ISCOL 2014
14 / 1
Results
R ESULTS : E XPLICITATION
Category
Explicitation
Feature
Cohesive Markers
Explicit naming
Single naming
Mean multiple naming
Accuracy (%)
81
58
56
54
ISCOL 2014
15 / 1
Results
R ESULTS : N ORMALIZATION
Category
Normalization
Feature
Repetitions
Contractions
Average PMI
Threshold PMI
Accuracy (%)
55
50
52
66
ISCOL 2014
16 / 1
Results
R ESULTS : I NTERFERENCE
Category
Interference
Feature
POS unigrams
POS bigrams
POS trigrams
Character unigrams
Character bigrams
Character trigrams
Prefixes and suffixes
Contextual function words
Positional token frequency
Accuracy (%)
90
97
98
85
98
100
80
100
97
ISCOL 2014
17 / 1
Results
Category
Interference
Feature
POS bigrams
POS trigrams
Character bigrams
Character trigrams
Positional token frequency
Accuracy
96
96
95
96
93
ISCOL 2014
18 / 1
Miscellaneous
R ESULTS : M ISCELLANEOUS
Category
Miscellaneous
Feature
Function words
Punctuation (1)
Punctuation (2)
Punctuation (3)
Pronouns
Ratio of passive forms to all verbs
Accuracy (%)
96
81
85
80
77
65
ISCOL 2014
19 / 1
Conclusion
C ONCLUSION
ISCOL 2014
20 / 1
Conclusion
OTHER C ONTRIBUTIONS
Ehud Alexander Avner, Noam Ordan, and Shuly Wintner,
Identifying Translationese at the Word and Sub-word Level,
Literary and Linguistic Computing, forthcoming
Gennadi Lembersky, Noam Ordan, and Shuly Wintner, Language
models for machine translation: Original vs. translated texts,
Computational Linguistics 38(4):799-825, 2012
Gennadi Lembersky, Noam Ordan, and Shuly Wintner, Improving
statistical machine translation by adapting translation models to
translationese, Computational Linguistics 39(4):999-1023, 2013
Naama Twitto, Noam Ordan, and Shuly Wintner, Statistical
Machine Translation and Automatic Identification of
Translationese, in preparation
ISCOL 2014
21 / 1
Conclusion
OTHER C ONTRIBUTIONS
ISCOL 2014
22 / 1
Conclusion
F UTURE D IRECTIONS
ISCOL 2014
23 / 1
Conclusion
ACKNOWLEDGEMENTS
Noam Ordan, Vered Volansky, Ehud Alexander Avner, Naama
Twitto, Gennadi Lembersky, Moshe Koppel
Israel Ministry of Science and Technology
ISCOL 2014
24 / 1
Conclusion
B IBLIOGRAPHY I
Roee Aharoni, Moshe Koppel, and Yoav Goldberg. Automatic detection of machine translated text and translation quality
estimation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages
289295, Baltimore, Maryland, June 2014. Association for Computational Linguistics. URL
http://www.aclweb.org/anthology/P14-2048.
Omar S. Al-Shabab. Interpretation and the language of translation: creativity and conventions in translation. Janus,
Edinburgh, 1996.
Ehud Alexander Avner, Noam Ordan, and Shuly Wintner. Identifying translationese at the word and sub-word level.
Literary and Linguistic Computing, Forthcoming.
Mona Baker. Corpus linguistics and translation studies: Implications and applications. In Mona Baker, Gill Francis, and
Elena Tognini-Bonelli, editors, Text and technology: in honour of John Sinclair, pages 233252. John Benjamins,
Amsterdam, 1993.
Marco Baroni and Silvia Bernardini. A new approach to the study of Translationese: Machine-learning the difference
between original and translated text. Literary and Linguistic Computing, 21(3):259274, September 2006. URL
http://llc.oxfordjournals.org/cgi/content/short/21/3/259?rss=1.
Shoshana Blum-Kulka. Shifts of cohesion and coherence in translation. In Juliane House and Shoshana Blum-Kulka,
editors, Interlingual and intercultural communication Discourse and cognition in translation and second language
acquisition studies, volume 35, pages 1735. Gunter Narr Verlag, 1986.
Shoshana Blum-Kulka and Eddie A. Levenston. Universals of lexical simplification. Language Learning, 28(2):399416,
December 1978.
Shoshana Blum-Kulka and Eddie A. Levenston. Universals of lexical simplification. In Claus Faerch and Gabriele Kasper,
editors, Strategies in Interlanguage Communication, pages 119139. Longman, 1983.
Andrew Chesterman. Beyond the particular. In A. Mauranen and P. Kujam
aki, editors, Translation universals: Do they
exist?, pages 3350. John Benjamins, 2004.
Martin Gellerstam. Translationese in Swedish novels translated from English. In Lars Wollin and Hans Lindquist, editors,
Translation Studies in Scandinavia, pages 8895. CWK Gleerup, Lund, 1986.
Shuly Wintner (University of Haifa)
ISCOL 2014
25 / 1
Conclusion
B IBLIOGRAPHY II
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The WEKA data
mining software: an update. SIGKDD Explorations, 11(1):1018, 2009. ISSN 1931-0145. doi:
10.1145/1656274.1656278. URL http://dx.doi.org/10.1145/1656274.1656278.
Iustina Ilisei. A Machine Learning Approach to the Identification of Translational Language: An Inquiry into
Translationese Learning Models. PhD thesis, University of Wolverhampton, Wolverhampton, UK, February 2013. URL
http://clg.wlv.ac.uk/papers/ilisei-thesis.pdf.
Iustina Ilisei and Diana Inkpen. Translationese traits in Romanian newspapers: A machine learning approach. International
Journal of Computational Linguistics and Applications, 2(1-2), 2011.
Iustina Ilisei, Diana Inkpen, Gloria Corpas Pastor, and Ruslan Mitkov. Identification of translationese: A machine learning
approach. In Alexander F. Gelbukh, editor, Proceedings of CICLing-2010: 11th International Conference on
Computational Linguistics and Intelligent Text Processing, volume 6008 of Lecture Notes in Computer Science,
pages 503511. Springer, 2010. ISBN 978-3-642-12115-9. URL http://dx.doi.org/10.1007/978-3-642-12116-6.
Moshe Koppel and Noam Ordan. Translationese and its dialects. In Proceedings of the 49th Annual Meeting of the
Association for Computational Linguistics: Human Language Technologies, pages 13181326, Portland, Oregon,
USA, June 2011. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/P11-1132.
David Kurokawa, Cyril Goutte, and Pierre Isabelle. Automatic detection of translated text and its impact on machine
translation. In Proceedings of MT-Summit XII, pages 8188, 2009.
Sara Laviosa. Core patterns of lexical use in a comparable corpus of English lexical prose. Meta, 43(4):557570, December
1998.
Sara Laviosa. Corpus-based translation studies: theory, findings, applications. Approaches to translation studies. Rodopi,
2002. ISBN 9789042014879.
Gennadi Lembersky, Noam Ordan, and Shuly Wintner. Language models for machine translation: Original vs. translated
texts. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages
363374, Edinburgh, Scotland, UK, July 2011. Association for Computational Linguistics. URL
http://www.aclweb.org/anthology/D11-1034.
Shuly Wintner (University of Haifa)
ISCOL 2014
26 / 1
Conclusion
B IBLIOGRAPHY III
Gennadi Lembersky, Noam Ordan, and Shuly Wintner. Adapting translation models to translationese improves SMT. In
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics,
pages 255265, Avignon, France, April 2012a. Association for Computational Linguistics. URL
http://www.aclweb.org/anthology/E12-1026.
Gennadi Lembersky, Noam Ordan, and Shuly Wintner. Language models for machine translation: Original vs. translated
texts. Computational Linguistics, 38(4):799825, December 2012b. URL
http://dx.doi.org/10.1162/COLI_a_00111.
Gennadi Lembersky, Noam Ordan, and Shuly Wintner. Improving statistical machine translation by adapting translation
models to translationese. Computational Linguistics, 39(4):9991023, December 2013. URL
http://dx.doi.org/10.1162/COLI_a_00159.
Lin ver
as. In search of the third code: An investigation of norms in literary translation. Meta, 43(4):557570, 1998.
Marius Popescu. Studying translationese at the character level. In Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, and
Nicolas Nicolov, editors, Proceedings of RANLP-2011, pages 634639, 2011.
Gideon Toury. Interlanguage and its manifestations in translation. Meta, 24(2):223231, 1979.
Gideon Toury. In Search of a Theory of Translation. The Porter Institute for Poetics and Semiotics, Tel Aviv University,
Tel Aviv, 1980.
Gideon Toury. Descriptive Translation Studies and beyond. John Benjamins, Amsterdam / Philadelphia, 1995.
Yulia Tsvetkov, Naama Twitto, Nathan Schneider, Noam Ordan, Manaal Faruqui, Victor Chahuneau, Shuly Wintner, and
Chris Dyer. Identifying the L1 of non-native writers: the CMU-Haifa system. In Proceedings of the Eighth Workshop
on Innovative Use of NLP for Building Educational Applications, pages 279287. Association for Computational
Linguistics, June 2013. URL http://www.aclweb.org/anthology/W13-1736.
Naama Twitto-Shmuel, Noam Ordan, and Shuly Wintner. Statistical machine translation and automatic identification of
translationese. Under review, Forthcoming.
ISCOL 2014
27 / 1
Conclusion
B IBLIOGRAPHY IV
Hans van Halteren. Source language markers in EUROPARL translations. In Donia Scott and Hans Uszkoreit, editors,
COLING 2008, 22nd International Conference on Computational Linguistics, Proceedings of the Conference, 18-22
August 2008, Manchester, UK, pages 937944, 2008. ISBN 978-1-905593-44-6. URL
http://www.aclweb.org/anthology/C08-1118.
Ria Vanderauwerea. Dutch novels translated into English: the transformation of a minority literature. Rodopi,
Amsterdam, 1985.
Vered Volansky, Noam Ordan, and Shuly Wintner. On the features of translationese. Literary and Linguistic Computing,
Forthcoming.
ISCOL 2014
28 / 1