Review of Language Proficiency Tests Final

Running head: TEW REVIEWS: TOEIC, IELTS, TOEFL JUNIOR
Review of Language Proficiency Tests: TOEIC, IELTS, TOEFL Junior Bridget Schuberg Colorado State University
TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR Abstract Language tests are often used by academia and businesses as a way to measure a testtaker's proficiency, and subsequently, his or her ability to fulfill certain tasks. Because the results from such tests are often used to make high-stakes decisions, it is essential that scores from the test truly reflect the test-taker's language ability. In this paper, I note the publisher, publication date, target audience, and cost of three popular language proficiency tests. Then, I review each test in terms of its purpose, structure, method of scoring, statistical distribution of scores, standard error of measurement, evidence of reliability, and evidence of validity. Then, using the aforementioned criteria, I examine which test would be most appropriate in a particular context. Keywords: mean, standard deviation, statistical error of measurement, statistical distribution, reliability, validity
TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR Review of Language Proficiency Tests: TOEIC, IELTS, TOEFL Junior In an age in which English language proficiency tests are as popular as ever, it is especially important to analyze and evaluate the usefulness of tests. As Bachman and Palmer (1996) point out, believing that one best test exists for any given situation, misunderstanding the nature of language testing and test development, having unreasonable expectations about what language tests can do and what they should be, and placing blind faith in the technology of measurements (p. 7) are all serious misconceptions surrounding language tests. Such misconceptions can result in serious problems, such as inappropriate test use and tests which do not meet the needs of the test users. Therefore, it is important that these potential negative consequences are taken into consideration. This includes evaluating the purpose, structure, and scoring of tests, particularly those which are used to make high-stakes decisions, are thoroughly examined for reliability, validity, and error. In choosing the TOEIC, the IELTS, and the TOEFL Junior, I evaluated my options in terms of their popularity in regions of the world in which I will likely teach in the future. Because teaching in Asia seems like a strong possibility at some point in my career, I believe it is important to be familiar with tests that are frequently used there. Furthermore, I am currently tutoring four different Korean students who can be classified as members of the target population of all three tests. I will also be teaching a group of adult Korean ELLs next semester, some of whom will likely aim to take either IELTS or TOEIC.
TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR These tests are not used exclusively in Asia, however; IELTS has recently become the most frequently-used high-stakes English test for those who do not speak English as a first language (www.ielts.org). TOEICs use has been increasing in the last few years as well (www.ets.org/toeic). Despite the fact that it has only been available since 2010, TOEFL Junior is already used in over 25 countries and is in the process of expanding its availability (www.ets.org/toefljunior). Furthermore, if we are to consider that the TOEFL Junior will soon be unveiling its Speaking and Writing test, all of the tests described in this review will offer ways of measuring proficiency in the four skill areas of listening, reading, speaking, and writing. In this way, all three tests will aim to provide a general English proficiency description of its test-takers. Test Reviews Test of International Communication (TOEIC)1 Publisher: TOEIC Program, Educational Testing Service, 1425 Lower Ferry Road, Ewing, NJ 08541 USA; telephone 609-773-7170; fax 610628-3722; toeic@ets.org; http://www.ets.org/toeic Publication Date: Listening and Reading: 1979; Speaking and Writing: 2006 Target Population: People working in an international environment; students at English language schools Cost: Varies by location; contact area representative for current price (In the United States, Listening and Reading generally $75; Speaking & Writing test generally $120) Overview: The TOEIC is a high-stakes, norm-referenced test originally designed by the Japanese Ministry of International Trade in 1979. Although its original purpose was to measure only reading and listening ability related to real-life work situations (Schmitt, 2005, p. 100), a Speaking and Writing test was released in 2006 that, when taken together with the Reading and Listening test, is meant to be an accurate indicator of general 1 Information adapted from www.ets.org/toeic.
TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR language proficiency (www.ets.org/toeic). An extended description of the most current TOEIC test is provided (see Table 1). Table 1 Extended description of TOEIC Test Test purpose Scores from the TOEIC test can be used in several different ways. The test is most typically associated with business, industry, or commerce that takes place in an international setting. Employers in such settings use TOEIC scores as a way to determine whether or not a potential employee is adequately prepared to compete in the competitive global workforce, in determining the sector in which an employee should be placed, as a grounds for promotion, or as a way to decide which employees are in need of job training in which areas. However, specialized knowledge or vocabulary beyond everyday work activities is not necessary to perform well on the test. (TOEIC Listening and Reading Examinee Handbook). However, in recent years, schools have begun to use the TOEIC for to answer questions related to admission purposes (i.e., who has the necessary skills to succeed in a particular institution?). The test may also be used for lowstakes decision-making as well; scores may be a measure of the effectiveness of a particular program (i.e., are a students scores indicative of an effective program?) or a way to determine how a given program needs to be improved (i.e., in what areas of the test are students not scoring highly?) (www.ets.org/toeic) Test structure All items on the TOEIC test are based on the following contexts: corporate development, dining out, entertainment, finance and budgeting, general business, health, housing/corporate property, manufacturing, offices, personnel, purchasing, technical areas, and traveling. The Listening Comprehension section of the TOEIC, delivered by audiotape, contains 100 items (Schmitt, 2005, p. 101). Test-takers are allotted 45 minutes for this section. The format is as follows: Photographs 10 questions Question-Response 30 questions Conversations 30 questions (10 conversations with 3 questions each) Talks 30 questions (10 talks with 3 questions each) The Reading Comprehension section of the TOEIC also contains 100 items. Test-takers are allotted 75 minutes for this section. The format is as follows: Incomplete Sentences 40 questions Text Completion 12 questions Single Passages 28 questions (7-10 reading texts with 2-5 questions each) Double Passages 20 questions (4 pairs of reading texts with 5 questions per pair)
TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR (TOEIC Listening and Reading Examinee Handbook, 2012) To complete the TOEIC Speaking test, test-takers are given approximately 20 minutes in which they must answer 11 questions. Tasks of the speaking test are organized and evaluated as follows:2 Questions Task Evaluation Criteria 12 Pronunciation Read a text aloud Intonation and stress 3 Describe a picture All of the above, plus Grammar Vocabulary Cohesion 46 Respond to questions All of the above, plus Relevance of content Completeness of content 79 Respond to questions using All of the above information provided 10 Propose a solution All of the above 11 Express an opinion All of the above For the writing portion of the test, test-takers have approximately 60 minutes to answer 8 questions. Tasks of the writing test are organized and evaluated as follows:3 Questions Task Evaluation Criteria 15 Write a sentence based on Grammar a picture Relevance of the sentences to the pictures 67 Respond to a written request Write an opinion essay Quality and variety of your sentences Vocabulary Organization Whether your opinion is supported with reasons and/or examples Grammar Vocabulary Organization
2 Table taken directly from the TOEIC Examinee HandbookSpeaking and Writing (2009). 3 Table taken directly from the TOEIC Examinee HandbookSpeaking and Writing (2009).
TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR The tasks in the Writing Test are designed to support three claims about the testtakers performance: 1. The test taker can produce well-formed sentences, including both simple and complex sentences. 2. The test taker can produce multisentence-length text to convey straightforward information, questions, instructions, narratives, etc. 3. The test taker can produce multiparagraph-length text to express complex ideas, using reasons, evidence, and extended explanation as appropriate. (TOEIC Examinee HandbookSpeaking and Writing, 2009). Because different jobs and tasks require different levels of English language proficiency, test-takers do not simply pass or fail the TOEIC Speaking and Writing tests, as ETS does not set these requirements. TOEIC test scores on the Listening and Reading sections are determined by the number of questions answered correctly. Raw scores are converted to scaled scores, which range from 5 to 495 points for each section (totaling 10 to 990 points for the overall score). When an overall score is determined, students receive a certificate whose color represents this score and their proficiency. Test-takers can take the Speaking Test without taking the Writing Test. Separate scores are assigned for each section, ranging from 0-200 points. Writing scores are sent to ETSs Online Scoring Network and scored by certified ETS raters using the aforementioned evaluation criteria and the following score scales: Question Task Score Scale 1-5 Write a sentence based 0-3 on a picture 6-7 Respond to a written 0-4 request 8 Write an opinion essay 0-5 Those who take the writing test are classified as belonging to one of nine proficiency levels: Writing Proficiency Level Writing Scaled Score 9 200 8 170-190 7 140-160 6 110-130 5 90-100 4 70-80 3 50-60 2 40 1 0-30 Speaking scores are sent to ETSs Online Scoring Network and scored by certified ETS raters using the aforementioned evaluation criteria and the following score scales: Question Task Score Scale 1-2 Read a text aloud 0-3
Scoring of the test
TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR Describe a picture 0-3 Respond to questions 0-3 Respond to questions using information 0-3 provided 10 Propose a solution 0-5 11 Express an opinion 0-5 Those who take the speaking test are classified as belonging to one of eight proficiency levels: Speaking Proficiency Level Speaking Scaled Score 8 190-200 7 160-180 6 130-150 5 110-120 4 80-100 3 60-70 2 40-50 1 0-30 The score report form for the TOEIC test includes the test-takers percentile rank for each section, score proficiency descriptions which detail the abilities typically associated with a specific scaled score and the individuals areas of strength, and an ability measured section which lists the percentage of questions answered correctly for each construct. The scores from the TOEIC test are only valid for two years from the receipt of the certificate. Average composite mean scores, standard deviations, and standard errors of measurement were not available for the TOEIC tests from recent years. However, this information organized according to specific criteria (e.g. native country, region) was available (Appendix A). One such example, the statistical distribution of a TOEIC Listening and Reading test taken by Japanese businessmen (Ito, Kawaguchi, & Ritsuko, 2005), is as follows: N Mean SD Total 8,386 440.2 178.1 Listening 8,386 246.5 87.6 Reading 8,386 193.7 97.4 Composite values were also unavailable for recent TOEIC Speaking and Writing tests, but the means and standard deviations found in one published study (Powers, Kim, Yu, Weng, & Vanwinkle, 2009) involving Japanese and Korean participants are as follows: N Mean SD Speaking 3,518 122.8 30.9 Writing 1,472 148.5 31.8 However, as Schmitt (2005) points out, because the TOEIC is so popular in Asia, most statistics are from this region of the world and should not be interpreted as generalizable for test-takers from other areas. The TOEIC Examinee HandbookListening and Reading (2012) claims that the SEM for these sections is typically about 25 scaled score points for each 3 4-6 7-9
Statistical distribution of scores
Standard error of
TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR measurement Evidence of reliability section (p. 23). No specific SEM is given for the Speaking and Writing Test. For the Listening and Reading components, the KR-20 reliability index is used to determine to what extent the items measure the same construct (i.e., internal consistency). This measure is based on the proportion of persons passing each item and the standard deviation of the total scores (Miller, Linn, and Gronlund, 2008, p. 114). The TOEIC Listening and Reading Examinee Handbook asserts that the reliability of these scores across all forms from [its] norming samples has been approximately 0.90 and up (p. 23). According to the TOEIC Speaking and Writing Examinee Handbook, ETS follows a strict 10-step process to ensure the reliability of its speaking and writing tests. First, it only hires highly qualified applicants to be raters, who then undergo rigorous training in order to pass a certification test. In addition, the raters must use thorough rubrics and guidelines that have been carefully developed. Scoring leaders use statistical reports to monitor rater performance during and after every scoring session. Raters must pass a calibration test before every scoring session; those who are not rating accurately are excluded from scoring. Finally, statisticians must review and analyze all scores before they are released. Agreement between raters on the Speaking and Writing Test was found to range from 50%-82%; using the generalizability index, the inter-rater reliability coefficient was found to be .8 and up for items on the Speaking Test and .80 and up for items on the Writing Test. For the TOEIC Speaking Test, the stratified internal alpha for total scores ranged from .82 to .86. The test-retest reliability estimate for the TOEIC Writing Test was about .83. (Statistical Analyses for the TOEIC Speaking and Writing Pilot Study, 2010, p. 8) The TOEIC Listening and Reading Examinee Handbook (2012) assures that validity for their test begins with their test development process, in which language-testing experts design and assemble the test in a way that includes relevant English-language tasks. The aforementioned measures of ensuring reliability are also applicable here. On a statistical level, however, the most commonly reported measure of validity for TOEIC tests is in self-assessments made by the test-takers themselves of their own language skills and abilities. These have supposedly shown moderately strong correlations (.40s and .50s) with TOEIC scores for the both the Speaking and Writing test and the Listening and Reading Test. Powers (2010) notes in his study that of test-takers in the highest TOEIC reading score level (440495), nearly all (96%) test takers said they could perform this task either easily or with little difficulty (p. 8). The TOEIC Listening and Reading Handbook (2012) therefore claims that such self-reports can be be interpreted as a measure of validity as well (p. 24). However, self-assessments are not always reliable as participants are not always accurate judges of their own proficiency. Therefore, it is critical that the TOEIC regularly employ more concrete statistical analyses and make these results widely available. Unfortunately, though the TOEIC test creators assure
Evidence of validity
TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR
10
that the utmost care is taken to ensure validity, very few statistics are available to support this claim.
International English Language Testing System (IELTS): Academic4 Publisher: University of Cambridge ESOL Examinations, the British Council, and IDP: IELTS Australia. Subject Manager, University of Cambridge ESOL Examinations, 1 Hills Road, Cambridge CB1 2EU United Kingdom; telephone 44-1223-553355; ielts@ucles.org.uk; Manager, IELTS International: 825 Colorado Boulevard, Suite 201, Los Angeles, CA 90041, USA; http://www.ielts.org/. Publication Date: 1989 (introduced as ELTS in 1980-1981) Target Population: Students over 16 for whom English is not a first language and who wish to work or attend university in an English-speaking country Cost: Varies greatly by location of test center; see http://www.ielts.org/ In general, costs are US $185-190. Overview: Although the IELTS was originally aimed at students planning to study or work abroad in the UK, Australia, and New Zealand, in recent years this test has recently become widely-used in the United States and Canada as well. There are two formats of the IELTS available to test-takers: academic and general training. The tests are designed to cover the full range of ability from non-user to expert user. An extended description of the IELTS test is provided (see Table 2).
Table 2 Extended description of IELTS Test
Test purpose
IELTS is designed to provide a measure of English language ability of those who do not speak it as a native language but wish to either work or study in an environment in which English is the main language of communication. Depending on the individual purposes of the test-taker, he/she may decide to take either the IELTS Academic test (for those planning to go to an Englishspeaking university or work for a professional organization for which English is a necessary language) or the IELTS General Training test (for those who plan to study English at the high school level, to work or train in English, or to immigrate to Australia or New Zealand). In this review, I will examine only the Academic test. The scores used from the IELTS can be used to make high-stakes decisions regarding study, training, or employment; the score required for admission will be determined by each organization or institution, as IELTS does not assign pass/fail scores. All centers offer paper-based IELTS tests; computer-based tests are available only at selected centers (IELTS Handbook, 2007, p. 3).
4 Information adapted from OSullivan, B. (2005) and www.ielts.org
TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR Test structure
11
Scoring of the test
The IELTS aims to measure proficiency in four skills: reading, writing, listening, and speaking. The listening section lasts for 30 minutes. Test-takers listen to four recorded monologues and/or conversations from a CD; each text is played only once. Test-takers receive time to read the question, write down their answers, and check them. When the recording ends, candidates are allowed ten minutes to transfer their answers to an answer sheet. The first two sections of the listening component have to do with social needs; the final two are related to educational or training contexts. Questions may be in a variety of formats, such as multiple choice, short-answer, sentence completion, note/summary/flow-chart/table completion, labeling a diagram, classification, and matching (IELTS Handbook, 2007, p. 6) One mark is given for each correct answer in the 40-item test. The reading section, in which test-takers read three long passages and then have to complete various tasks, lasts 60 minutes. Texts are authentic and may be taken from magazines, journals, books, and newspapers, but do not require specialized knowledge. In addition to the aforementioned question types, candidates may also be asked to identify a writers views/claims (yes/no/not given), identify information in the text (true/false/not given), or match lists and phrases. At least one of the four texts contains a logical argument. The texts may contain non-verbal parts such as diagrams, graphs, or illustrations. One mark is awarded for each correct answer in this 40-item test. The 60-minute writing section consists of two tasks. In the first task, testtakers must either summarize, describe, or explain a table, graph, chart, or diagram in at least 150 words; candidates are advised to take about 20 minutes on this task. The second task is a short essay and must be of at least 250 words; candidates should take about 40 minutes on this part. Both tasks are assessed according to task achievement/response, coherence and cohesion, lexical resource, grammatical range and accuracy. Lastly, test-takers must complete an 11-14 minute speaking component in the form of a face-to-face interview. The interview is composed of three parts: 1) Introduction and Interview (4-5 minutes) 2) Individual long turn (3-4 minutes, including 1 minute preparation time) 3) Two-way discussion (4-5 minutes) Scoring is done by trained examiners who are closely monitored. Each test-taker receives a band score for each individual component (ranging from 1, Non User, to 9, Expert). Each component score is then averaged and rounded to deliver an Overall Band Score, which may be reported in half or whole bands (www.ielts.org). Typically, universities require at least a composite band score of 6.5 (OSullivan, 2005, p. 75). The Band Score Descriptions are as follows: Band 9: Expert user: has fully operational command of the language: appropriate, accurate and fluent with complete understanding. Band 8: Very good user: has fully operational command of the language with only occasional unsystematic inaccuracies and inappropriacies. Misunderstandings may occur in unfamiliar situations. Handles complex detailed argumentation well.
12
Statistical distribution of scores
Band 7: Good user: has operational command of the language, though with occasional inaccuracies, inappropriacies and misunderstandings in some situations. Generally handles complex language well and understands detailed reasoning. Band 6: Competent user: has generally effective command of the language despite some inaccuracies, inappropriacies and misunderstandings. Can use and understand fairly complex language, particularly in familiar situations. Band 5: Modest user: has partial command of the language, coping with overall meaning in most situations, though is likely to make many mistakes. Should be able to handle basic communication in own field. Band 4: Limited user: basic competence is limited to familiar situations. Has frequent problems in understanding and expression. Is not able to use complex language. Band 3: Extremely limited user: conveys and understands only general meaning in very familiar situations. Frequent breakdowns in communication occur. Band 2: Intermittent user: no real communication is possible except for the most basic information using isolated words or short formulae in familiar situations and to meet immediate needs. Has great difficulty understanding spoken and written English. Band 1: Non-user: essentially has no ability to use the language beyond possibly a few isolated words. Band 0: Did not attempt the test: No assessable information provided. (www.ielts.org) The following tables show the mean band scores for female and male candidates in the Academic module in 2011. Mean band scores for female candidates Listening Reading Writing Academic 6.1 6.1 5.6 Speaking 5.9 Overall 6.0
Standard error of measurement
Mean band scores for male candidates Listening Reading Writing Speaking Overall Academic 5.9 5.8 5.4 5.7 5.8 Mean scores organized according to country of origin and native language can be found at www.ielts.org. The following table shows the mean, standard deviation, and standard error of measurement of Listening and Reading modules in 2011. Mean Standard deviation SEM Listening 6.1 1.3 0.390 Academic Reading 5.9 1.2 0.379 Composite means, standard deviations, and SEMs for Speaking and Writing are not available. (www.ielts.org) According to www.ielts.org, the composite SEM for the academic module in 2011 was 0.22. The SEM for the Listening and the Academic Reading component s in 2011 were 0.390 and 0.379, respectively. No SEMs were given
13
Evidence of reliability
for the Speaking and Writing components. The IELTS handbook notes that the SEM should be thought of in terms of the final band scores for the Listening and Reading modules. Reliability regarding the listening and reading components of the IELTS are reported using Cronbachs alpha to measure internal consistency. The alpha from Listening and Reading material from 2011 is as follows: Average Alpha across versions of Listening module: 0.91 Average Alpha across versions of Academic Reading module: 0.90 The speaking and writing components, however, cannot be assessed for reliability in the same manner as they are not item-based; instead, their performances are rated by certified examiners who have been trained face-toface and who follow detailed descriptive criteria and rating scales All examiners must be retrained and re-certified every two years (www.ielts.org). Furthermore, the standards used in IELTS are fixed so as to further contribute to reliability (IELTS Handbook, 2007). Recent generalizability studies based on examiner certification data have shown coefficients of 0.83-0.86 for Speaking and 0.81-0.89 for Writing. The composite reliability of the academic version of IELTS in 2011 was reported as 0.96. (www.ielts.org). The use of authentic texts as a basis for test tasks contributes to seemingly high face validity of IELTS, as the assessment tasks purportedly have a high correlation to the TLU domain (Miller et al., 2008; Bachman, 2006). Furthermore, as reliability is often considered a part of validity, the aforementioned reliability information can also be seen as a demonstration of validity. The most common form of research used to address the question of validity in IELTS is that which explores predictive validity. For example, Humphreys, Haugh, Fenton-Smith, Lobo, Michael, & Walkinshaw (2010) gave an IELTS test to students before and after a semester of study in undergraduate courses. They found that the listening and reading scores of these students were highly correlated with their GPA during this semester. Typically, these studies show a decent correlation between academic outcomes and IELTS test scores upon entry to an institution.These studies can be accessed by visiting http://ielts.org/researchers/research/predictive-validity.aspx In addition to the statistical evidence provided in these studies, the IELTS Handbook asserts that every step taken in the test production process fosters validity. The test-item creators are experts commissioned from all over the world, and are guided by a strict table of specifications. Using the feedback from the pre-editing phase, items are revised once more. Then, representative groups of future IELTS test-takers pretest this new material to make sure it is suitable and effective for the purposes of the IELTS. The statistics on reading and listening tasks, examiner reports on writing and speaking tasks, and feedback from the pre-test candidates are gathered from this pretesting, and then analyzed by the Research and Validation group at Cambridge ESOL to determine measurement characteristics of the material (e.g. difficulty of items,
14
discrimination potential of items). From there, the creators of the test can make informed decisions regarding whether or not the test material is appropriate for an IELTS test, and standards are fixed to indicate the same measure of ability in each component. Each component of the test is carefully controlled to ensure that the testtaker does not have to demonstrate proficiency in a skill area irrelevant to the defined construct of each section. Therefore, the oral/written input given to help test-takers complete a speaking or writing task is minimal, and tasks in the reading and listening sections do not require detailed writing or speaking. This strict adherence to the well-defined table of specifications to can be seen as contributing to the construct validity of the test (Miller et al., 2008).The great length of the test may also contribute to the comprehensiveness of the test and therefore to validity. Additionally, the format of the speaking component is supported by a substantial body of academic research that claims that speaking face-to-face with an interviewer as opposed to a computer prompts more realistic answers and is preferred by test-takers. To reduce potential bias, the item designers are obliged to include a range of cultural perspectives and a variety of voices and accents in the listening sections; they must also balance task types and topic and genre to ensure they do not favor test-takers of a particular intelligence or who have knowledge of a specific topic. (IELTS Handbook 2007, p. 16) Test of English as a Foreign Language (TOEFL) Junior: Standard Publisher: TOEFL Junior Program, Educational Testing Service, PO Box 6156, Princeton, NJ 08548-6155 USA; Fax: 973-735-1903 toefljunior@ets.org; http://www.ets.org/toefl_junior. Publication Date: 2010 Target Population: intended for EFL students worldwide aged 11 to 15 who are representative of English-medium instructional environments Cost: Cost is dependent upon number of tests ordered.
Number of Tests 5 copies (minimum order) 6-100 101-200 Price Each $200 $40 $35
Overview: The TOEFL Junior test was first introduced by ETS in 2010 to provide teachers with a measure of their students English proficiency. The creators of the TOEFL Junior are currently working on the TOEFL Junior Comprehensive Test, which will be computer-based and will include speaking and writing components. No test centers offer the TOEFL Junior; those who wish to administer it must purchase it from ETS, indicating how many test books and answer sheets they desire. ETS supplies a CD for the listening comprehension section, brochures about the test, student consent forms, a supervisors manual with administration instructions, and a kit for returning the answer sheets to ETS for processing. An extended description of the TOEFL Junior Test is provided (see Table 3).
15
Table 3 Extended description of the TOEFL Junior Test Test purpose The TOEFL Junior is a paper-based test used to help teachers make lowto medium-stakes decisions regarding their students. It is meant to provide a general standard to measure proficiency levels of students aged 11-15who are representative of an English-medium instructional environment. The test is designed to see how well students can succeed in the following areas: negotiat[ing] social and interpersonal interactions in the school context, navigat[ing] the school environment and receiv[ing] school-related information, and learn[ing] academic material in the content areas. (Wolf & Steinberg, 2012, p. 1). The scores acquired from the test can be used to answer a variety of questions, such as those related to admission (does the student have the skills to succeed in a classroom that heavily requires communicative English ability?), placement (are students matched with level-appropriate instruction?), and achievement (how much progress has this student made over time?). Because each TOEFL Junior score report form includes the students Lexile measure, the test is also useful in advising students on appropriate book selections. The most common users of the TOEFL Junior tests are English language programs, international schools in which English is the language of instruction, and schools in non-English speaking countries that teach content through English. Test structure TOEFL Junior test-takers are given 1 hour and 55 minutes for completion. The paper-based test is composed of three sections: listening comprehension (42 items, 40 minutes), language form and meaning (42 items, 25 minutes), and reading comprehension (42 items, 50 minutes). Organized by subskills, the listening comprehension section is composed 40% of understanding explicit information and details, 23% getting the main idea, and 37% making inferences. The language form and meaning section is 70% grammar and 30% vocabulary. Lastly, the reading comprehension section is composed approximately of 23% linguistic knowledge (e.g. vocabulary and grammar), 43% understanding explicit information and details, 13% getting the main idea, and 20% making inferences. Organized by context, approximately 23% of the content is considered social (e.g. greeting, describing, etc.); 20%, navigational (e.g. identifying, extracting info); and 57%, academic (e.g. defining, summarizing, evaluating). (Wolf & Steinberg, 2012, p. 1) ETS is currently in the process of a comprehensive TOEFL Junior test that will also include speaking and writing sections. Information regarding how these sections will be structured is not yet available.
Scoring of the The number of questions a student has answered correctly determines test his/her TOEFL Junior test score. Wrong answers are not penalized. The listening and reading sections, which contain only multiple-choice
16
questions, are scored by computer. (The speaking and writing portions that will soon be released, however, will be scored by ETS-trained human raters). Each section is worth 200-300 points for a total score of 600-900 points. Every testtakers overall score is assigned an overall band score from 1-6. The band descriptions are as follows:5 Level Label Overall Performance Descriptor 6 Excellent A typical student at Level 6 consistently demonstrates the skills needed to communicate at a high level in complex interactions and while using complex materials. 5 Advanced A typical student at Level 5 often demonstrates the skills needed to communicate at a high level in complex interactions and while using complex materials. 4 Competent A typical student at Level 4 demonstrates the skills needed to communicate successfully in some complex situations and in most simple interactions and while using basic materials. 3 Achieving A typical student at Level 3 usually demonstrates the skills needed to communicate successfully in simple interactions and while using basic materials. 2 Developing A typical student at Level 2 occasionally demonstrates the skills needed to communicate successfully in simple interactions and while using basic materials. 1 Beginning A typical student at Level 1 demonstrates some basic language skills but needs to further develop those skills in order to communicate successfully. Information regarding correspondence between the raw scores and the scaled scores was not available on the TOEFL Junior website or in any unpublished studies. Each TOEFL Junior score report provides a description of the typical abilities associated with the scaled score level of the test-takers to help them identify their strengths and weaknesses. Furthermore, each section test score is mapped to the Common European Framework of Reference (CEFR). Statistical distribution of As the TOEFL Junior is so new, few descriptive statistics are available scores aside from what follows: 5 Table taken directly from www.ets.org/toefljunior.
17
Descriptive Statistics for English Language Learners in Grades 7 and 8 Grade Grade 8 (N = 311) 7 (N = 427) Sectio Min Max Mean SD Min Max Mean SD n Total 605 895 768 62.5 620 885 777 62.7 LC 200 300 272 20.2 200 300 274 20.8 LFM 200 300 255 21.9 200 300 260 21.9 RC 200 295 240 28.7 200 295 243 29.4 (Wolf and Steinberg, 2011) Using the equation stated in Miller et al.,(2008, p. 121) and the reliability coefficients listed below, the standard errors of measurement can also be calculated. SEM = s (1-r), where s = standard deviation and r = reliability coefficient for the test The SEM for the total test for 7th graders = 62.5 (1-.95) = 13.98 The SEM for the total test for 8th graders = 62.7(1-.95) = 14.02. Standard error of measurement Evidence of reliability The standard error of measurement for TOEFL Junior Standard is around 10 for all three measures (Listening Comprehension, Reading Comprehension, Language Form and Meaning). (TOEFL Junior Test Handbook, in press, p. 29) The reliability of the TOEFL Junior was estimated using the Cronbachs alpha reliability coefficient. Reliability Estimates of the TOEFL Junior Standard Test Scores Listening Section .87 Language Form and .87 Meaning Section Reading Section .89 Total Test .95 (TOEFL Junior Test Handbook, in press, p. 28) In attesting for validity, the designers of the TOEFL Junior assert that all item writers and reviewers are highly qualified, are experts in matching items to content standards and relating parts of the test to each other, and come from diverse backgrounds (demographically, geographically, institutionally, and politically). As they are experts, they can avoid certain factors that may negatively influence validity as listed in Miller et al. (2008), such as unclear directions, ambiguous statements, improper arrangement of answers, or identifiable patterns of answers (p. 98). A pretest is used to determine Differential Item Functioning (DIF) for each item, and only low DIF items are used in the final test. All complaints and reports are investigated, and a preliminary item analysis (PIA) is done to identify problematic items (too hard, too easy, poor discrimination) that should be changed (www.ets.org/toefljunior). Because the TOEFL Junior is still relatively new, researchers are still in
TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR the process of additional studies. According to Jennifer Hsiao, the product administrator of TOEFL Junior, one study that is currently underway is Validating the intended uses of TOEFL Junior Standard scores for placement decision in boarding schools in the United States. When this study is released, perhaps more conclusive data related to validity will be available. Tests in Context The situation I am envisioning takes place in a rural area in Vietnam, a country in which all three of the given tests are widely used. The thirty candidates are heterogeneous in country of origin (Vietnam) and in native language (Vietnamese). However, they are of various ages, ranging from 15-50, and all come from diverse educational backgrounds as well. The younger candidates are currently in high school, while the older candidates either do not have any secondary education or have college degrees. All students have been taking general English courses for at least two years now, and therefore range from intermediate to advanced levels of proficiency. Students motivations for taking these general English courses vary widely. Some students are attempting to immigrate to an English-speaking country. Many younger students hope to study abroad in an English speaking country and want to be able to communicate. Many of the higher-proficiency students plan to get a job that will require them to use English. Though their goals are very different, many students had to take this class because it was either the only one they could afford or the only class that was close enough to their homes. Regardless of their goal, however, all students seem highly motivated. Unfortunately, because of the remoteness of the location, materials to prepare for English language proficiency tests are hard to come by. Though the area does have a library, it is not very extensive, and few of the books are written in English. The teacher
18
TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR also finds it too expensive to order and import books. Therefore, the students rely on the teacher to use examples she could find for free on the internet and distribute to the students. Based on this specific situation, I believe that the IELTS test would be the best choice. First of all, IELTS is designed to cover the entire range of language proficiency, from non-user to expert, and in the proposed situation the students are of all different proficiency levels. Furthermore, while TOEFL Junior is designed solely for students and the TOEIC is designed mainly for workers, IELTS can be applied to both sectors. IELTS is widely accepted and is gaining in popularity. Therefore, all goals in the proposed context could be accommodated by use of the IELTS. While the TOEIC is mainly geared toward adult learners and the TOEFL Junior is designed solely for students 11-15, IELTS can be used by learners as young as 16 but is also appropriate for adults. Though all the tests examined prior have adequate reliability and validity data, IELTS had the highest reliability coefficient (0.96) and was the only test of the three whose publisher made recent research statistics widely available. Furthermore, though all three tests assured the expert status and rigorous training procedure of test developers and raters, IELTS was the only test of the three whose research database demonstrated several validity studies. Because the hypothetical learners would also lack access to test-taking strategy books and rely mostly on material from the internet, I believe they would score the best on the IELTS test which uses authentic texts as a basis for tasks. It would be more likely that students would be familiar with the content and task types given in this test than on the other two, enhancing their chances of success.
19
TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR Lastly, IELTS would be the most cost-effective option for the hypothetical students. To order TOEFL Junior tests (even if it were appropriate for the students) would cost $1200. While TOEIC and IELTS are similar in price, IELTS is typically $510 cheaper. Though IELTS takes almost an hour longer to take, few if any test centers offer the option of taking the TOEIC Listening and Reading and Speaking and Writing tests one after another, which is most likely less practical than taking one long test all at once. Additionally, IELTS manuals are widely available, making it easy for ESL teachers to help students prepare for the test.
20
References Bachman, L., & Palmer, A. (1996). Language testing in practice. New York: Oxford University Press. Handbook for the TOEFL Junior standard test (in press). Princeton, NJ: Educational Testing Service. Humphreys, P., Haugh, M., Fenton-Smith, B., Lobo, A., Michael, R., & Walkinshaw, I. (2010). Tracking international students' English proficiency over the first semester of undergraduate study. IDP: IELTS Australia. IELTS handbook. (2007). Pasadena, CA: IELTS International. Ito, T., Kawaguchi, K., & Ohta, Ritsuko. (2005). A study of the relationship between TOEIC scores and functional job performance: Self-assessment of foreign language policy. Princeton, NJ: Educational Testing Service. Liao, C., and Wei, Y. (2010). Statistical analyses for the TOEIC Speaking and Writing
TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR pilot study. Princeton, NJ: Educational Testing Service. Miller, D., Linn, R., & Gronlund, N. (2008). Measurement and assessment in teaching. (Tenth Edition). Upper Saddle River, NJ: Merrill, Prentice Hall. OSullivan, B. (2005). International English Language Testing System (IELTS). In S. Stoynoff & C. Chapelle, ESOL Tests and Testing (pp. 73-78). Alexandria, VA: TESOL. Powers, D., Kim, H., Yu, F., Weng, V. Z., & VanWinkle, W. (2009). The TOEIC Speaking and Writing tests: Relations to test-taker perceptions of proficiency in English. Princeton, NJ: Educational Testing Service. Powers, D. (2010). Validity: What does it mean for TOEIC tests? Princeton, NJ: Educational Testing Service. Schmitt, D. (2005). Test of English for International Communication. In S. Stoynoff & C. Chapelle, ESOL Tests and Testing (pp. 100102). Alexandria, VA: TESOL. TOEIC Examinee handbooklistening and reading. (2012). Princeton, NJ: Educational Testing Service. TOEIC examinee handbookspeaking and writing (2009). Princeton, NJ: Educational Testing Service. The TOEIC test: Test of international communication: Report on test takers worldwide. (2005). Princeton, NJ: Educational Testing Service. Wolf, M., & Steinberg, J. (2011). An examination of United States middle school students performance on TOEFL Junior. Princeton, NJ: Educational Testing Service. Wolf, M., & Steinberg, J. (2012). The cross-national comparability of the internal
21
TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR structure of the TOEFL Junior test. Princeton, N.J.: Educational Testing Service.
22
Appendix A Means and Standard Deviations of TOEIC Listening and Reading Tests (According to Native Country)
23
Table taken directly from: The TOEIC test: Test for international communication: Report on test takers worldwide. (2005). Princeton, NJ: Educational Testing Service.

Review of Language Proficiency Tests Final

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Review of Language Proficiency Tests Final

Uploaded by

Copyright:

Available Formats

Running head: TEW REVIEWS: TOEIC, IELTS, TOEFL JUNIOR

Scoring of the test

Statistical distribution of scores

TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR

4 Information adapted from OSullivan, B. (2005) and www.ielts.org

TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR Test structure

Scoring of the test

TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR

Statistical distribution of scores

Standard error of measurement

TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR

TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR

TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR

TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR

TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR

TEST REVIEWS: TOEIC, IELTS, TOEFL JUNIOR

You might also like