You are on page 1of 11

Investigating the Effect of Raters L1 Background on Writing Assessment

A Presentation for IJAS


Paris, France April 8, 2013 by Farah Bahrouni
Sultan Qaboos University (SQU)

OMAN bahrouni@squ.edu.om

Confusion is the beginning of learning.


Socrates (469-399 BC)

If we knew what we were doing, we wouldnt call it research.


Albert Einstein

These 2 quotations might explain why I am here!

Outline:
1) Claim 2) Literature 3) Study 3.1 Data collection 3.2 Tool: FACETS & One-Way ANOVA 3.3 Results 4) Conclusion Implication & Significance 5) References & Further Readings

1. Claim
L1 downplayed in literature: seconded to culture Significant standalone source of rater discrepancy in performance assessment Should be studied as a facet (aspect/feature/factor) on its own

Back

2. Literature
Research has established that writing assessment can by no means be objective Studies have probed possible reasons for its subjectivity extensively:

Weigle (1994: 23, 24) grouped sources of raters' disagreement into three categories: within the text : prompt, writers background & ability within the rater (the focus of this study): physical & psychological conditions within the rating context: when, where & under what conditions the rating is done She adds that interactions among these sources are also possible:
A rater from a certain background may react to a text written in a certain style differently from the way a rater from a different background would. p. 24

Bachman (1990) refers to the above sources as: potential sources of measurement error and categorizes them into three groups: test method factors (e.g. raters, prompt type, etc.), personal attributes (e.g. test taker's cognitive style, knowledge of particular content, etc.) random factors (e.g. fatigue, time of day, etc)
Back

3. Study
Quantitative Data collection 32 ESL teachers from 4 different language backgrounds (8 native

speakers, 8 Arabs, sharing students mother tongue, 8 Indians,


and 8 Russians) scored 3 essays written by 3 Omani university

students. All raters are experienced ESL/EFL teachers, and have


taught in the Omani context for a minimum of 2 years

Back

2. Analysis: 2.1 Facets Vertical Ruler: the higher up in the column, the more severe L1 Measurement Report (Table 7.3.1): all indices show that the difference between the 4 L1s is significant Measure Fit analysis Reliabilty 2.2 ANOVA One-Way ANOVA indicates some similarities between Native Speakers and Indians on one hand, and on the other between Arabs and Russians in the ways they scored the 3 essays. The significant discrepancy is between Arabs and Indians.
Back

4. Implication & significance:


Findings indicate that L1 does have a significant impact on a raters behavior in a writing assessment event This jeopardizes the reliability of the scoring process as well as the validity of the obtained results A panoply of ways could be used to mitigate L1 impact: Training Double/Triple marking Improve the marking criteria in a way that idiosyncrasies are stopped from playing a role there
Back

REFERENCES & Further Readings Alderson, J. C. (1991). Bands and Scores. In J. C. Alderson & B. North (Eds.), Language Testing in the 1990s: The Communicative Legacy (Vol. 71 - 86). London and Basingstoke: Macmillan Publishers Limited. Alderson, J. C., Clapham, C., & Wall, D. (1995). Language Test Construction and Evaluation: Cambridge University Press. Bachman, L. F. (1990). Fundamental Considerations in Language Testing: Oxford: Oxford University Press. Bachman, L. F., & Palmer, A. S. (1996). Language Testing in Practice: Designing and Developing Useful Language Tests.: Oxford: Oxford University Press. Brindley, G. (1998). Describing language development? Rating scales and SLA. In: L. F. Bachman & A. D. Cohen (Eds .), Interfaces between second language acquisition and language testing research. CUP. Fulcher, G. (2000). The 'communicative' legacy in language testing. System, 28, 483 -497. Fulcher, G. (2010). Practical Language Testing. Hodder Education, An Hachette UK Company Fulcher, G., Davidson, F. & Kemp, J. (2011) Effective rating scale development for speaking tests: Performance decision trees. Language Testing 28 (1) 5-29 Hamp-Lyons, L. (1991). Scoring procedures for ESL contexts. In L. Hamp-Lyons (Ed.), Assessing Second Language Writing in Academic Contexts (pp. 241-276). Norwood, NJ: Ablex Publishing Corporation. Hunter, D. M., Jones, R. M., & Randhawa, B. S. (1996). The Use of Holistic versus Analytic Scoring for Large-Scale Assessment of Writing. The Canadian Journal of Program Evaluation, 11(2), 61 - 85. North, B. (2000) The development of a Common Framework Scale of Language Proficiency: Theoretical Studies in Second Language Acquisition P. Lang. North, B. (2003). Scales for rating language performance: Descriptive models, formulation styles, and presentation formats. TOEFL Monograph, 24. North, B. & Schneider, G. (1998) Scaling descriptors for language proficiency scales. Language Testing 15 (2) 217-263 Weigle, S. C. (1994). Effects of training on raters of English as a second language compositions: Quantitative and Qualitative approaches. University of California, Los Angeles. Weigle, S. C. (2002). Assessing Writing. Cambridge: Cambridge University Press.

Thank you

You might also like