You are on page 1of 5

PTLC2005 J. Szpyra-Kozowska, J. Frankiewicz, M. Nowacka, L.

Stadnicka, Assessing Assessment Methods: 1

Assessing assessment methods on the reliability of pronunciation tests in EFL


Jolanta Szpyra-Kozowska, Justyna Frankiewicz, Marta Nowacka, Lidia Stadnicka Maria Curie-Skodowska University, Lublin, Poland
1. Introductory remarks Teaching another language is inevitably tied with testing. Teachers have to assess the learners linguistic ability, their progress and achievements. In this respect pronunciation is no different from other language skills; if we regard it as an important element of communicative competence which deserves a place in language instruction, we should also be able to evaluate the process of teaching/learning it as well as its outcome. Yet, as pointed out by Celce-Murcia et al. (1996: 341), in the existing literature on teaching pronunciation, little attention is paid to issues of testing and evaluation. The major reason for this negligence is the fact that, as argued by Heaton (1988: 88), speaking, which obviously comprises pronunciation, is a very complex skill to permit any reliable analysis to be made for the purpose of objective testing. The present paper addresses the issue of the reliability of the most frequently employed assessment methods of EFL learners pronunciation. First we examine impression-based pronunciation testing in the internationally recognized Cambridge English Examinations and point to its various shortcomings. Next we present a report on an experiment which compares two approaches to pronunciation testing: holistic (global, impressionistic) and atomistic (analytic) We point to their strengths and weaknesses, and show that they are not equivalent and lead to different results. 2. Pronunciation assessment in Cambridge English Examinations In evaluating different methods of pronunciation testing, it seems useful to start with analyzing the way in which is it done in international English language examinations. Pronunciation does not play any important role in the majority of them (for a detailed analysis see Szpyra-Kozowska 2003). Cambridge examinations are no exception to this rule; candidates get only 5%-6% of the total score for this skill. The assessment is impressionistic in nature Thus, the following criteria have been adopted for the 5 basic examinations: KET (Key English Test) pronunciation is heavily influenced by L1 features and may at times be difficult to understand; PET (Preliminary English Test) pronunciation is generally intelligible, but L1 features may put a strain on the listener; FCE (First Certificate in English) although pronunciation is easily understood, L1 features may be intrusive; CAE (Certificate in Advanced English) L1 accent may be evident but does not affect the clarity of the message; CPE (Certificate of Proficiency in English) pronunciation is easily understood and prosodic features are used effectively; many features, including pausing and hesitation, are native-like. It is obvious that these requirements are very general and impression-based. Also comments addressed to examiners make constant reference to the vague notions of intelligibility and the amount of strain a candidates pronunciation puts on the listener. In the manual, evaluators, who are usually experienced nonnative teachers of

PTLC2005 J. Szpyra-Kozowska, J. Frankiewicz, M. Nowacka, L. Stadnicka, Assessing Assessment Methods: 2

English, are instructed as follows, when assessing pronunciation, examiners should try to put themselves in the position of a non-EFL specialist, native speaker of English and assess the amount of strain on the listener and the degree of patience and effort required to understand the candidate. This procedure raises the following doubts: 1. A professional teacher of English cannot be required to pretend to be a nonEFL specialist who, in addition, is a native speaker of English; not everyone has a talent of pretending to be a completely different person (what if he fails?). 2. It is not clear what kind of native speaker the examiner is supposed to impersonate a well-travelled university professor, familiar with many nonnative varieties of English or a small-town housewife who has never left her birthplace? 3. A nonnative teacher in most cases can understand even very bad English of his fellow-countrymen because of his/her frequent exposure to it. He is, therefore, in no position to judge its intelligibility to users of English of different nationalities than his own. 4. Having no precise criteria of pronunciation assessment, the examiner is likely to adopt his own subjective principles of evaluation (see section 3). This often happens in spite of standardization procedures and examiners training. We can conclude that the examinations under analysis do not provide clear-cut criteria of assessing the examinees pronunciation by relying too heavily on very imprecise impressionistic judgements and by making unreasonable demands on nonnative examiners. This, in turn, seriously undermines their inter-rater reliability. 3. Holistic versus atomistic pronunciation testing As shown in the preceding section, Cambridge English Examinations, similarly to many other language tests, employ rather objectionable impressionistic evaluation. It is therefore crucial to examine its logical alternative, i.e. analytic testing. In this section these two approaches to pronunciation assessment are compared and verified. In the holistic approach to language testing (Alderson et al. 1996:289), examiners are asked not to pay too much attention to any one aspect of a candidates performance, but rather to judge its overall effectiveness. The greatest advantage of this procedure is that it can be administered to large groups of learners within a short period of time. Moreover, according to Underhill (1987:101), impression marking is used for the kind of categories that are very hard to define but everybody agrees are important: fluency, ability to communicate, style, naturalness of speech, and so on. For these reasons it is advocated by many researchers (e.g. Celce-Murcia et. al.1996, Hughes 1991, Koren 1995). Nevertheless, global pronunciation testing has many drawbacks. It is often too general and imprecise since the assessment criteria in the rating scales, as has been shown in section 2, tend to be vague. This means, in consequence, that different raters might adopt their own criteria of evaluation. Finally, as pointed out by Underhill (1987: 101), making accurate impression-based assessments requires a lot of experience. () Even experienced assessors find it difficult to make consistent impression-based judgements. In other words, this procedure raises problems both of intra-rater and inter-rater reliability. Analytic evaluation consists in establishing a detailed marking scheme in which specific aspects of the learners performance are evaluated separately. Subsequently

PTLC2005 J. Szpyra-Kozowska, J. Frankiewicz, M. Nowacka, L. Stadnicka, Assessing Assessment Methods: 3

these different ratings are combined to provide an overall mark. An atomistic approach to pronunciation testing thus involves judgements on the correctness of the learners production of particular vowels, consonants, stress, rhythm, intonation, etc. This method of pronunciation testing is claimed to be more objective than the holistic approach as it provides a more detailed diagnosis of the learners problems and achievements. It is generally preferred by pronunciation specialists and phoneticians (e.g. Vaughan-Rees 1989). On the other hand, atomistic procedure is not without its problems. It is extremely time-consuming and requires recording the learners speech samples and subsequent listening to them several times by the raters. For these reasons this approach seems unsuitable for large classes and examinations with many participants. According to Hughes (1991), the choice between holistic and analytic scoring depends to some extent on the purpose of testing; atomistic tests are more reliable for diagnostic purposes in the language classroom and in the situations in which scoring is carried out in many places by different judges, while holistic evaluation, which is faster, is more appropriate for experienced scorers who are well familiar with the grading system. In order to compare both approaches, we have carried out an experiment whose primary goal was to examine whether the holistic and atomistic procedures of pronunciation testing are equivalent and bring about the same results. In the experiment reported here 10 judges, all teachers of English, evaluated the pronunciation of 10 randomly selected intermediate Polish learners, secondary school pupils, who were asked to read aloud a short passage, which was subsequently recorded. The raters were first asked to evaluate holistically pupils pronunciation recorded on the tape using an ordinary scale of Polish school marks of 1, 2, 2,5, 3, 3,5, 4, 4,5, 5 and 6, where 1 = failure and 6 = excellent. After a break of two weeks the same group of raters assessed the recordings once again. On this occasion they were given the following 6 criteria to be employed in the evaluation: pronunciation of individual words, vowel quality (the /i/ - /i:/ distinction in particular), the interdental fricatives, the -ing suffix, word stress and other phonetic features. Each of these aspects were rated individually using the same scoring scale as before. Subsequently, the means were calculated. Finally, the assessors were asked to comment on the strengths and weaknesses of both approaches. The questionnaires have revealed that in making holistic evaluation the raters adopted, in fact, various analytic criteria (such as the pronunciation of silent letters, intonation, pauses, devoicing of final obstruents, etc.), which differed from person to person. Moreover, 90% of assessors regarded atomistic testing as more reliable and objective. The table below contains the results of the experiment. We provide averaged atomistic and holistic marks given by the raters.

PTLC2005 J. Szpyra-Kozowska, J. Frankiewicz, M. Nowacka, L. Stadnicka, Assessing Assessment Methods: 4

Learners L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 Mean

ASSESSMENT Holistic Atomistic 3.7 3 3 2.5 4 3.2 3.2 3.1 4 3.3 4.4 4.1 4.1 3.6 3 3 2.5 2.8 3.4 3.1 3.53 3.17

Table 1. Results of holistic and atomistic assessment As can clearly be observed, in 8 cases out of 10 the mean atomistic marks are lower that the holistic marks. In one case the results are reversed and in one are the same. The obtained means are 3,53 in the holistic evaluation and 3,17 in the analytic procedure. To verify the obtained results, another experiment, a replica of the previous one, has been conducted with a different group of 5 raters and 5 other learners. This time the mean scores have been 3.56 in the holistic and 3.04 in the analytic assessment. Thus, a conclusion can be drawn that the holistic and atomistic approaches to pronunciation testing are not equivalent; the former usually results in higher scores than analytic assessment. This means that raters generally tend to be more lenient in their overall impressions than in judgements made on the basis of more specific criteria. An explanation of this phenomenon can be sought in the likely assumption that in atomistic testing the focus seems to be on error finding more than in the holistic procedure, where the criterion of intelligibility is employed, which allows for a more tolerant approach to phonetic inaccuracies. 4. Final remarks Pronunciation is extremely difficult to test in an objective and reliable fashion. We have demonstrated that Cambridge English Examinations, just like other similar tests, are based entirely on impressionistic evaluation and raise many objections with regard to their reliability. We have considered an alternative procedure of analytic evaluation and demonstrated that the two methods are not exactly equivalent, the former being more lenient and permissive than the latter. The atomistic approach can be regarded as more objective and reliable, and is particularly well-suited for diagnostic purposes as it allows the teacher to identify specific pronunciation problems of the learners to be dealt with in the course of subsequent instruction. It is, however, time-consuming and not easy to execute with large groups of learners or examinees. Holistic testing, on the other hand, is technically simpler to carry out. It is invaluable in assessing the overall impression, the intelligibility of the learners speech and other aspects of his pronunciation which cannot be easily expressed by means of definite, clear-cut criteria. Its reliability, however, is questionable. Apparently, none of these two methods can be viewed as fulfilling all the necessary requirements of objectivity, reliability and practicality.

PTLC2005 J. Szpyra-Kozowska, J. Frankiewicz, M. Nowacka, L. Stadnicka, Assessing Assessment Methods: 5

References Alderson, C. J., Wall, D. & C. Claphaim. (1996). Language Test Construction and Evaluation. Cambridge: Cambridge University Press. Celce-Murcia, M., Brinton, D. & J. Goodwin. 1996. Teaching Pronunciation: a Reference for Teachers of English to Speakers of Other Languages. Cambridge: Cambridge University Press. Heaton, J. B. 1988. Writing English Language Tests. London: Longman. Hughes, A. (1991). Testing for Language Teachers. Cambridge: Cambridge University Press. Koren, S. (1995). Foreign language pronunciation testing: a new approach. System 23 (3). 387-400. Szpyra-Kozowska, J. (2003). Miejsce i rola fonetyki w midzynarodowych egzaminach Cambridge, TOEFL i TSE. Zeszyty Naukowe PWSZ w Pocku. Neofilologia. Tom V. 181-191. Underhill, N. (1987). Testing Spoken Language. A handbook of oral testing techniques. Cambridge: Cambridge University Press. Vaughan-Rees, M. (1989). The testing of pronunciation receptive skills. Speak Out! 4. p. 8.

You might also like