Lte A Mid Yulia Efnawati

PRINCIPLES OF LANGUAGE ASSESSMENT
Mid-Term Assignment of Language Teaching Evaluation

By
Yulia Efnawati - 18178041
Faculty of Language and Arts
Padang State University
ABSTRACT
This article focus on the explanations about the five principle of language assessment,
they are; validity, reliability, practicality, authenticity and washback. These article starts with
the deep explanation related to the five principles and then explain some standards in
designing an assessment to be valid, reliable, practice, authentic, and washback. The last, this
article explain statistical tools to make an assessment valid, reliable, practice, authentic, and
washback. This article used document analysis methods. The information in this article based
on some books and journals that related to five principles in language assessment. As the
result, designing an assessment is not easy. Therefore, the test taker should has a good
comprehensions related to the five principle of language assessment,, how to design a test
based on the five principle and statistical tools to make an assessment to be valid, reliable,
practice, authentic, and washback.
Keyword: validity, reliability, practicality, authenticity, and washback
A. INTRODUCTION
English is a foreign language in Indonesia. It is studied from the elementary school
until the university. The government of Indonesia and the private institution are struggling to
enhance teaching and learning process of English.
Education is an important aspect in teaching and learning process. Generally, there are
some aspects which cannot be separated each other. They are the teachers, the students, the
materials, the methods and the mediums of teaching and learning. The teachers usually make
the lessons plan to give the material to theirs students before teaching and learning activity.
They also use the methods to conduct it and they decide some mediums to know the result of
their teaching and learning activities.
Acccording to (Boud 2007 in Raymond 2012), assessment is a fundamental aspect of
learning, both in simulated and real-life situations and has been said to be the single most
1
powerful influence on learning in formal university courses. In other ways, assessment is a
process by which information is obtained relative to some known objective or goal.
Assessment is a broad term that includes testing. A test is a special form of assessment. Tests
are assessments made under contrived circumstances especially so that they may be
administered. In other words, all tests are assessments, but not all assessments are tests. We
test at the end of a lesson or unit. We assess progress at the end of a school year through
testing, and we assess verbal and quantitative skills through such instruments as the SAT and
GRE. Whether implicit or explicit, assessment is most usefully connected to some goal or
objective for which the assessment is designed. A test or assessment yields information
relative to an objective or goal. In that sense, we test or assess to determine whether or not an
objective or goal has been obtained. Assessment of skill attainment is rather straightforward.
Either the skill exists at some acceptable level or it doesn’t. Skills are readily demonstrable.
Assessment of understanding is much more difficult and complex. Skills can be practiced;
understandings cannot. We can assess a person’s knowledge in a variety of ways, but there is
always a leap, an inference that we make about what a person does in relation to what it
signifies about what he knows. In the section on this site on behavioral verbs, to assess means
To stipulate the conditions by which the behavior specified in an objective may be
ascertained. Such stipulations are usually in the form of written descriptions.
Assessment play important role in learning process. Assesment is a key component of
learning process because it helps students learn. When the students are able to see how they
are doing in a class, they are able to determine whether or not they understand the material.
Beside in teacher point of view, by doing assessment, teachers can hopefully gain
information about every aspects of their students especially their achievement. Brown (2004)
defines assessment as an ongoing process that encompasses a much wider domain. It covers
assessing all students learning process. Assessment is becoming increasingly important in
language education. Understand it better with this course for language teachers. There are
some principles in language assessment. They are, validity, reliability, practicality,
authenticity, and washback. It is important for us to know the five majosr of the principles
because it can easy us to make an assessment in the classroom and know how to apply it. In
applying those principles, we need to consider some purposes on what those principles should
be done.
2
B. RESEARCH METHOD
This research was descriptive qualitative research by using content analysis
design. Then, Chelimsky (1989) state content analysis is a set of procedures for
collecting and organizing information in a standardized format that allows analysis to
make inferences about the characteristics and meaning of written and other recorded
materials. Thus, in this research, the researcher find and explain some relavant
information by deeply investigate some journals. These data were analyzed in purpose
to get the key information about principles in language assessment; validity,
reliability, practicality, authenticity, and washback, how to make an assessment to be
valid, reliable, practical, authentic, and washback and also statistical tool needed in
order to make an assessment valid, reliable, practical, authentic, and washback which
extracted from the qualified books and articles.
C. DISCUSSION
1. Principles in language assessment
Teachers need to consider five principles of language assessment when they
create assessments（Brown & Abeywickrama, 2010, p. 25）in Yoneda (2012):
• Validity
• Reliability
• Practicality
• Authenticity
• Washback
These principles, which are all of equal importance, may be used to evaluate a
designed assessment.
a. Validity
According to Messick (1998), defined validity as an integrated
evaluative judgement of the degree to which empirical evidence and theoretical
rationales support the adequacy and appropriateness of inferences and action
based on test scores or other modes of assessment. A good test should be valid
or accurate. Some experts have defined the term of validity. For example, the
validity of a test is the extent to which it measures what it is supposed to
measure. The relationship between test performance and other types of
performance in other contexts is considered. According to Brown, validity as
3
the extent to which inferences made from assessment results are appropriate,
meaningful, and useful in terms of the purpose of the assessment. It can be
inferred that when a test is valid, it can elicit students’ certain abilities as it is
intended to. The valid test can also measure what it is supposed to measure.
b. Reliability
A reliable test is consistent and dependable. If you give the same test to the
same student or matched student on two different occasions, the test should yield
similar. A reliable test is consistent in its conditions across two more administrations;
gives clear directions for scoring/evaluation; has uniform rubrics for
scoring/evaluation; lends itself to consistent application of those rubrics by the scorer;
contains items/tasks that are unambiguous to the test-taker.
Reliability refers to consistency and dependability. A same test
delivered to a same student across time administration must yield same results.
Factors affecting reliability are student-related reliability which is student
personal factors such as motivation, illness, anxiety can hinder from their ‘real’
performance. Then, rater reliability that ither intra-rater or inter-rater leads to
subjectivity, error, bias during scoring tests. Next, test administration
reliability is when the same test administered in different occasion, it can result
differently. The last, test reliability is dealing with duration of the test and test
instruction. If a test takes a long time to do, it may affect the test takers
performance such as fatigue, confusion, or exhaustion. Some test takers do not
perform well in the timed test. Test instruction must be clear for all of test
takers since they are affected by mental pressures.
c. Practicality
Practicality refers to evaluating the assessment according to cost, time
needed, and usefulness. This principle is important for classroom teachers.
Brown and Abeywickrama（ 2010）in Yoneda 2012, have explained the
attributes of practical tests as follows: a practical test

• stays within budgetary limits
• can be completed by the test-taker within appropriate time constraints
• has clear directions for administration
• appropriately utilizes available human resources
• does not exceed available material resources
• considers the time and effort involved for both design and scoring
4
This principle refers to the logistical, down-to-earth administrative
issues involved in making, giving and scoring an assessment instrument or
dealing with time and energy, tests should be efficient in terms of making,
doing, and evaluating. Next is authenticity. According to Bachman and palmer
(1996) defined authenticity as the degree of correspondence of the
characteristic of a given language test task to the features of a target language
task. A test must be authentic which means the degree of correspondence of
the characteristics of a given language test task to the features of a target
language. Several things must be considered in making an authentic test:
language used in the test should be natural, the items are contextual, topics
brought in the test should be meaningful and interesting for the learners, the
items should be organized thematically, and the test must be based on the real-
world.
d. Authenticity
According to (Bachman & Palmer, 1996）in Yoneda 2012, they state
that, authenticity is defined as the degree of correspondence of the

characteristics of a given language test task to the features of a target language
task.（ Brown & Abeywickrama, 2010, p. 37）in Yoneda 2012, says that an
authentic test has the following attributes:

• contains language that is as natural as possible
• has items that are contextualized rather than isolated
• includes meaningful, relevant, interesting topics
• provides some thematic organization to items, such as through a story
line or episode
• offers tasks that replicate real-world tasks
e. Washback
According to Green (2013), Washback refers to the impact that a test
has on the teaching and learning done in preparation for it. Teachers must be
able to create classroom tests that serve as learning devices through which
washback is achieved. Washback enhances intrinsic motivation, autonomy,
self-confidence, language ego, interlanguage, and strategic investment in the
students. Instead of giving letter grades and numerical scores which give no
5
information to the students’ performance, giving generous and specific
comments is a way to enhance washback. This as backwash effect which falls
into macro and micro aspects. In macro aspect, tests impact society and
education system such as development of curriculum. In micro aspect, tests
impact individual student or teacher such as improving teaching and learning
process. Washback can also be negative and positive. It is easy to find negative
wash back such as narrowing down language competencies only on those
involve in tests and neglecting the rest. While language is a tool of
communication, most students and teachers in language class only focus on
language competencies in the test. On the other hand, a test can be positive
washback if it encourages better teaching and learning. However, it is quite
difficult to achieve. An example of positive washback of a test is National
Matriculation English Test in China. It resulted that after the test was
administered, students’ proficiency in English for actual or authentic language
use situation improved. Washback can be strong or weak. An example of
strong effect of the test is national examination; meanwhile weak effect of the
test is the impact of formative test. Let us compare and decide how most
students and teachers react on those two kinds of test.
(Brown & Abeywickrama, 2010) in Yoneda 2012 state that washback
means the effects of the tests have on instruction in terms of how students
prepare for the tes. It refers to the outcomes for the learner, the teacher, and the
teaching context. Washback can be positive or negative. Thus, according to
(Brown & Abeywickrama, 2010) in Yoneda 2012, the conceptof washback
should comprise the following features: a test that provides beneficial
washback
• positively influences what and how teachers teach
• positively influences what and how learners learn
• offers learners a chance to adequately prepare
• gives learners feedback that enhances their language development
• is more formative in nature than summative
• provides conditions for peak performance by the learner
6
2. How to make an assessment to be valid, reliable, practical, authentic and
washback.
In designing an assessment to be valid, reliable, practical, authentic and
washback is not easy. An assessment is need to be valid, reliable, practical,
authentic and washback. Each of those principles has their own way or standard
that should be considered by the teacher about how to make them can be used
appropriately in the teaching and learning process. The teacher should understand
the way how to make the assessment to be valid, reliable, practical, authentic and
washback.
Validity is arguably the most important criteria for the quality of a test. The
term validity refers to whether or not the test measures what it claims to measure.
On a test with high validity the items will be closely linked to the test's intended
focus. For many certification and licensure tests this means that the items will be
highly related to a specific job or occupation. If a test has poor validity then it does
not measure the job-related content and competencies it ought to. When this is the
case, there is no justification for using the test results for their intended purpose.
There are several ways to estimate the validity of a test including content validity,
concurrent validity, and predictive validity. The face validity of a test is sometimes
also mentioned.
Reliability is one of the most important elements of test quality. It has to do
with the consistency, or reproducibility, or an examinee's performance on the test.
For example, if you were to administer a test with high reliability to an examinee
on two occasions, you would be very likely to reach the same conclusions about
the examinee's performance both times. A test with poor reliability, on the other
hand, might result in very different scores for the examinee across the two test
administrations. If a test yields inconsistent scores, it may be unethical to take any
substantive actions on the basis of the test. There are several methods for
computing test reliability including test-retest reliability, parallel forms reliability,
decision consistency, internal consistency, and interrater reliability. For many
criterion-referenced tests decision consistency is often an appropriate choice.
To measure that assessment to be practical, The test must be not
excessively expensive, stays within appropriate time constraints, is relatively easy
to administer, and has a scoring/evaluation procedure that is specific and
timeefficient. A test that is prohibitively expensive is impractical' A test of
7
language proficiency that takes a student five hours to complete is impractical it
consumes more time and money than necessary to accomplish its objective. A test
that requires individual one-on-one proctoring is impractical for a group of several
hundred test-takers and only a handful of examiners. A test that takes a few
minutes for a student to take and several hours for an examiner to evaluate is
impractical for most classroom situations. A test that can be scored only by
computer is impractical if the test takes place a thousand miles away from the
nearest computer. The value and quality of a test sometimes hinge on such nitty-
gritty, practical considerations.
Authenticity is define as the degree of correspondence of the characteristics
of a given language test task to the features of a target language task. Brown
(2004:28) points that in a test, authenticity are presented in the following ways: the
language in the test is as natural as possible, items are contextualized rather than
isolated, topics arc meaningful (relevant, Interesting) for the learner, sonii thematic
organization to items is provided, such as through a story line or episode and tasks
represent, or closely approximate, real-world tasks.
The last one is washback. Washback refers to is the effect of the testing on
teaching and learning. The beneficial washback are positively influences what and
how teacher teaches, positively influences what and how learners learn, offers
learners a chance to adequately prepare, gives learner feedback that enhances their
language development, it is more formative in nature than summative, the last
provides conditions for peak performance by the learner. In designing a test with a
good washback the teachers should give the feedback to the students after
assessing their task. For example when the students’ performance a speech as their
speaking task, after they perform, the teacher should give some comments and
suggestion to increase students language ability.
3. Statistical tool to make an assessment valid, reliable, practical, authentic and

washback.
Most of the day to day assessment work carried out by teachers in the
classroom requires little statistical knowledge. Sophisticated analysis tools that work
well when used with hundreds or thousands of test takers are not generally of much
real help to a teacher who works with classes of twenty, thirty or sixty students.
8
Language teachers can become very effective as assessors without becoming
statisticians.
On the other hand, language teachers do benefit from at least a basic awareness
of statistics – of the principles that inform them, if not the details of the calculations.
This knowledge can help them to understand the meaning of external test scores as
well as helping them to improve the quality of the assessment materials they use and
to carry out effective classroom research.
There are some statistical concepts in interpreting the score of test; the first is
the normal distribution. A key statistical concept is used when interpreting test
scores in these ways: the normal distribution. The normal distribution is something
that is very widely observed in nature. If you look around you in any busy street you
will notice a few very tall people and a few who are quite short, but most will be
quite close to average height. Similarly, if you give a language test to people who
have all been studying for a similar length of time, their scores will tend to cluster
around the average or mean score for the group.
Second is using standardized scores in grading. A problem that often comes
up for teachers is that scores from different assessments have to be combined to
generate an overall result for a student over a term, semester or year’s work.
Standardizing scores offers one way of putting them all onto a comparable scale so
that they can aggregate in a relatively objective way. If the teacher wanted to base an
overall score for this semester on both the TEA and the EAT tests. If we just added
together the scores from the TEA (out of 100) and the EAT (out of 200). The EAT
would carry more weight in the final score than the TEA3. Standardizing the scores
equalises the contribution of each. On assessments with a relatively narrow spread of
scores (such as tests of speaking skills scored on a rating scale) an excellent
performance might be only a few points above an average performance while a good
grammar test score could be 20 points above the mean. Standardizing the scores
gives more credit to the excellent speaking score.
In testing reliability the teacher can used Carla’s score of 79% on the TEA test
is places her three quarters of the way along the horizontal axis, her score of 54% on
the EAT test puts her just above half way up the vertical axis. To compare Carla’s
performance on the two tests with those of another student – Luigi – we can add his
scores to the picture. He scored 35 on the TEA test and 17 on the EAT test. This
shows quite clearly that Carla has done better than Luigi on both tests
9
In other ways we also can use such as using cronbach alpha, ANOVA and
scale Likert in order to make an assessment valid, reliable, practical, authentic and
washback. Furthermore, statistical tool to assess reliability by using cronbach alpha.
According to (Bland, 1997), cronbach alpha is a test of internal consistency and
frequently used to calculate the correlation values among the answers on assessment
tool.
D. CONCLUSION
Based on the discussion in the study above, this article has shown, teachers
need to evaluate their assessment tools according to the five cardinal criteria for
judging a test: practicality, reliability, validity, authenticity, and washback. To design
good assessment, teachers should pay attention to validity, reliability, practicality,
authenticity, and washback. It can be done by asking some question to ourselves as a
test designer or doing test. The test will give us the information whether it is a
qualitative or quantitative but both are important for evaluation
REFERENCES
Bachman, L. (2004). Statistical Analyses for Language Assessment. Cambridge: Cambridge

University Press
Brown, J. D. (1996). Testing in language programs. Upper Saddle River, NJ: Prentice Hall
Regents
Brown, D. (2004). Language Assessment: Principles and Classroom. New York: Longman.
Brown, D. (2010). Language Assessment: Principles and Classroom. New York: Longman.
Green, A. (2013). Washback in language assessment. Servicio de Publicaciones. Universidad

de Murcia. All rights reserved. IJES, vol. 13 (2), 2013, pp. 39-51 Print ISSN: 1578-
7044; Online ISSN: 1989-6131
10
Hughes, A, (1993). Backwash and TOEFL 2000. Unpublished manuscript, University of
Reading.
Raymond, J.E., Homer, C. S.E., Smith, R., Gray, J.E. (2012). Learning through authentic
assessment: An evaluation of a new development in the undergraduate midwifery
curriculum. Elsevier Ltd. All rights reserved .doi.org/10.1016/j.nepr.2012.10.006
Yoneda, M. (2012). Designing Assessment Tools: The Principles of Language Assessment.

Mukogawa Women’s University, Nishinomiya, 663-8558.
Straub, D., Boudreau, M.-C. & Gefen, D. 2004. Validation guidelines for IS positivist
research. Communications of the Association for Information Systems, 13, 380-427.
Sullivan, Gail M. MD, MPH, 2011, A Primer on the Validity of Assessment Instruments.
Journal of Graduate Medical Education, DOI: 10.4300/JGME-D-11-00075.1
11

Lte A Mid Yulia Efnawati

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lte A Mid Yulia Efnawati

Uploaded by

Copyright:

Available Formats

PRINCIPLES OF LANGUAGE ASSESSMENT

Mid-Term Assignment of Language Teaching Evaluation

create assessments（Brown & Abeywickrama, 2010, p. 25）in Yoneda (2012):

Brown and Abeywickrama（ 2010）in Yoneda 2012, have explained the

attributes of practical tests as follows: a practical test

According to (Bachman & Palmer, 1996）in Yoneda 2012, they state

that, authenticity is defined as the degree of correspondence of the

authentic test has the following attributes:

3. Statistical tool to make an assessment valid, reliable, practical, authentic and

judging a test: practicality, reliability, validity, authenticity, and washback. To design

good assessment, teachers should pay attention to validity, reliability, practicality,

authenticity, and washback. It can be done by asking some question to ourselves as a

qualitative or quantitative but both are important for evaluation

Bachman, L. (2004). Statistical Analyses for Language Assessment. Cambridge: Cambridge

Green, A. (2013). Washback in language assessment. Servicio de Publicaciones. Universidad

Yoneda, M. (2012). Designing Assessment Tools: The Principles of Language Assessment.

You might also like