The document summarizes the purposes and contents of the Malmö University-Chalmers Corpus of Writing as a Process (MUCH corpus). The MUCH corpus aims to facilitate the study of writing processes through multiple drafts of student and PhD texts tagged with writing and feedback processes, rhetorical and linguistic structures. It includes 500,000 words in initial pilot version and will eventually include over 1.2 million words. The unique design of capturing drafting process presents challenges around structuring the corpus, determining appropriate metadata and interfaces to best serve researchers.
The document summarizes the purposes and contents of the Malmö University-Chalmers Corpus of Writing as a Process (MUCH corpus). The MUCH corpus aims to facilitate the study of writing processes through multiple drafts of student and PhD texts tagged with writing and feedback processes, rhetorical and linguistic structures. It includes 500,000 words in initial pilot version and will eventually include over 1.2 million words. The unique design of capturing drafting process presents challenges around structuring the corpus, determining appropriate metadata and interfaces to best serve researchers.
The document summarizes the purposes and contents of the Malmö University-Chalmers Corpus of Writing as a Process (MUCH corpus). The MUCH corpus aims to facilitate the study of writing processes through multiple drafts of student and PhD texts tagged with writing and feedback processes, rhetorical and linguistic structures. It includes 500,000 words in initial pilot version and will eventually include over 1.2 million words. The unique design of capturing drafting process presents challenges around structuring the corpus, determining appropriate metadata and interfaces to best serve researchers.
The Malm University-Chalmers Corpus of Writing as a Process
Purposes Our contribution Challenges To build a drafts corpus to The MUCH corpus: The unique design presents facilitate the study of writing includes multiple drafts of several challenges, for example: processes from different student texts How do we structure and perspectives; to study features, tags both writing and format the corpus to fit such as feedback, argumentation feedback processes current archiving standards? techniques and rhetorical tags both rhetorical and What kind of of metadata structures; and to use the corpus linguistic structures should be included? for the teaching of academic brings together composition What kind of interface best writing studies, EFL studies and serves the researcher and corpus linguistics user community? To what extent can existing The corpus Tagging a feedback sequence tagging systems be used, and 1. First draft, PhD Pilot version of 500,000 words student 2. Comments from what new tags need to be made up of three drafts of: Long chain n-3 poly- two PhD students devised for our material? saturated fatty acids A: I think it is better 400 student texts (LC n-3 PUFA), to define them. The team B: Are these known 50 PhD student texts especially EPA and DHA found in by your readers? Andreas Eriksson, Chalmers, Gothenburg Damian Finnegan, Asko Kauppinen, Maria Wiktorsson, Peer and teacher comments Anna Wrnsby, Malm University, Malm Peter Withers, MaxPlanck Institute for Psycholinguistics, Self-reflective comments 3. Revised draft (changes underlined) Long chain n-3 polysaturated fatty acids (LC Nijmegen Contact: andreas.eriksson@chalmers.se, Complete corpus: n-3 PUFA), especially EPA (eicosapentaenoic anna.warnsby@mah.se acid) and DHA (docosapentaenoic acid) Funding: the initial phase of the project is supported by approximately 1.2 million words found in the Crafoord Foundation, Lund, Sweden