Professional Documents
Culture Documents
ABSTRACT
This article focus on the explanations about the five principle of language assessment,
they are; validity, reliability, practicality, authenticity and washback. These article starts with
the deep explanation related to the five principles and then explain some standards in
designing an assessment to be valid, reliable, practice, authentic, and washback. The last, this
article explain statistical tools to make an assessment valid, reliable, practice, authentic, and
washback. This article used document analysis methods. The information in this article based
on some books and journals that related to five principles in language assessment. As the
result, designing an assessment is not easy. Therefore, the test taker should has a good
comprehensions related to the five principle of language assessment,, how to design a test
based on the five principle and statistical tools to make an assessment to be valid, reliable,
practice, authentic, and washback.
Keyword: validity, reliability, practicality, authenticity, and washback
A. INTRODUCTION
English is a foreign language in Indonesia. It is studied from the elementary school
until the university. The government of Indonesia and the private institution are struggling to
enhance teaching and learning process of English.
Education is an important aspect in teaching and learning process. Generally, there are
some aspects which cannot be separated each other. They are the teachers, the students, the
materials, the methods and the mediums of teaching and learning. The teachers usually make
the lessons plan to give the material to theirs students before teaching and learning activity.
They also use the methods to conduct it and they decide some mediums to know the result of
their teaching and learning activities.
Acccording to (Boud 2007 in Raymond 2012), assessment is a fundamental aspect of
learning, both in simulated and real-life situations and has been said to be the single most
1
powerful influence on learning in formal university courses. In other ways, assessment is a
process by which information is obtained relative to some known objective or goal.
Assessment is a broad term that includes testing. A test is a special form of assessment. Tests
are assessments made under contrived circumstances especially so that they may be
administered. In other words, all tests are assessments, but not all assessments are tests. We
test at the end of a lesson or unit. We assess progress at the end of a school year through
testing, and we assess verbal and quantitative skills through such instruments as the SAT and
GRE. Whether implicit or explicit, assessment is most usefully connected to some goal or
objective for which the assessment is designed. A test or assessment yields information
relative to an objective or goal. In that sense, we test or assess to determine whether or not an
objective or goal has been obtained. Assessment of skill attainment is rather straightforward.
Either the skill exists at some acceptable level or it doesn’t. Skills are readily demonstrable.
Assessment of understanding is much more difficult and complex. Skills can be practiced;
understandings cannot. We can assess a person’s knowledge in a variety of ways, but there is
always a leap, an inference that we make about what a person does in relation to what it
signifies about what he knows. In the section on this site on behavioral verbs, to assess means
To stipulate the conditions by which the behavior specified in an objective may be
ascertained. Such stipulations are usually in the form of written descriptions.
Assessment play important role in learning process. Assesment is a key component of
learning process because it helps students learn. When the students are able to see how they
are doing in a class, they are able to determine whether or not they understand the material.
Beside in teacher point of view, by doing assessment, teachers can hopefully gain
information about every aspects of their students especially their achievement. Brown (2004)
defines assessment as an ongoing process that encompasses a much wider domain. It covers
assessing all students learning process. Assessment is becoming increasingly important in
language education. Understand it better with this course for language teachers. There are
some principles in language assessment. They are, validity, reliability, practicality,
authenticity, and washback. It is important for us to know the five majosr of the principles
because it can easy us to make an assessment in the classroom and know how to apply it. In
applying those principles, we need to consider some purposes on what those principles should
be done.
2
B. RESEARCH METHOD
This research was descriptive qualitative research by using content analysis
design. Then, Chelimsky (1989) state content analysis is a set of procedures for
collecting and organizing information in a standardized format that allows analysis to
make inferences about the characteristics and meaning of written and other recorded
materials. Thus, in this research, the researcher find and explain some relavant
information by deeply investigate some journals. These data were analyzed in purpose
to get the key information about principles in language assessment; validity,
reliability, practicality, authenticity, and washback, how to make an assessment to be
valid, reliable, practical, authentic, and washback and also statistical tool needed in
order to make an assessment valid, reliable, practical, authentic, and washback which
extracted from the qualified books and articles.
C. DISCUSSION
1. Principles in language assessment
Teachers need to consider five principles of language assessment when they
• Validity
• Reliability
• Practicality
• Authenticity
• Washback
These principles, which are all of equal importance, may be used to evaluate a
designed assessment.
a. Validity
According to Messick (1998), defined validity as an integrated
evaluative judgement of the degree to which empirical evidence and theoretical
rationales support the adequacy and appropriateness of inferences and action
based on test scores or other modes of assessment. A good test should be valid
or accurate. Some experts have defined the term of validity. For example, the
validity of a test is the extent to which it measures what it is supposed to
measure. The relationship between test performance and other types of
performance in other contexts is considered. According to Brown, validity as
3
the extent to which inferences made from assessment results are appropriate,
meaningful, and useful in terms of the purpose of the assessment. It can be
inferred that when a test is valid, it can elicit students’ certain abilities as it is
intended to. The valid test can also measure what it is supposed to measure.
b. Reliability
A reliable test is consistent and dependable. If you give the same test to the
same student or matched student on two different occasions, the test should yield
similar. A reliable test is consistent in its conditions across two more administrations;
gives clear directions for scoring/evaluation; has uniform rubrics for
scoring/evaluation; lends itself to consistent application of those rubrics by the scorer;
contains items/tasks that are unambiguous to the test-taker.
Reliability refers to consistency and dependability. A same test
delivered to a same student across time administration must yield same results.
Factors affecting reliability are student-related reliability which is student
personal factors such as motivation, illness, anxiety can hinder from their ‘real’
performance. Then, rater reliability that ither intra-rater or inter-rater leads to
subjectivity, error, bias during scoring tests. Next, test administration
reliability is when the same test administered in different occasion, it can result
differently. The last, test reliability is dealing with duration of the test and test
instruction. If a test takes a long time to do, it may affect the test takers
performance such as fatigue, confusion, or exhaustion. Some test takers do not
perform well in the timed test. Test instruction must be clear for all of test
takers since they are affected by mental pressures.
c. Practicality
Practicality refers to evaluating the assessment according to cost, time
needed, and usefulness. This principle is important for classroom teachers.
4
This principle refers to the logistical, down-to-earth administrative
issues involved in making, giving and scoring an assessment instrument or
dealing with time and energy, tests should be efficient in terms of making,
doing, and evaluating. Next is authenticity. According to Bachman and palmer
(1996) defined authenticity as the degree of correspondence of the
characteristic of a given language test task to the features of a target language
task. A test must be authentic which means the degree of correspondence of
the characteristics of a given language test task to the features of a target
language. Several things must be considered in making an authentic test:
language used in the test should be natural, the items are contextual, topics
brought in the test should be meaningful and interesting for the learners, the
items should be organized thematically, and the test must be based on the real-
world.
d. Authenticity
task.( Brown & Abeywickrama, 2010, p. 37)in Yoneda 2012, says that an
6
2. How to make an assessment to be valid, reliable, practical, authentic and
washback.
In designing an assessment to be valid, reliable, practical, authentic and
washback is not easy. An assessment is need to be valid, reliable, practical,
authentic and washback. Each of those principles has their own way or standard
that should be considered by the teacher about how to make them can be used
appropriately in the teaching and learning process. The teacher should understand
the way how to make the assessment to be valid, reliable, practical, authentic and
washback.
Validity is arguably the most important criteria for the quality of a test. The
term validity refers to whether or not the test measures what it claims to measure.
On a test with high validity the items will be closely linked to the test's intended
focus. For many certification and licensure tests this means that the items will be
highly related to a specific job or occupation. If a test has poor validity then it does
not measure the job-related content and competencies it ought to. When this is the
case, there is no justification for using the test results for their intended purpose.
There are several ways to estimate the validity of a test including content validity,
concurrent validity, and predictive validity. The face validity of a test is sometimes
also mentioned.
Reliability is one of the most important elements of test quality. It has to do
with the consistency, or reproducibility, or an examinee's performance on the test.
For example, if you were to administer a test with high reliability to an examinee
on two occasions, you would be very likely to reach the same conclusions about
the examinee's performance both times. A test with poor reliability, on the other
hand, might result in very different scores for the examinee across the two test
administrations. If a test yields inconsistent scores, it may be unethical to take any
substantive actions on the basis of the test. There are several methods for
computing test reliability including test-retest reliability, parallel forms reliability,
decision consistency, internal consistency, and interrater reliability. For many
criterion-referenced tests decision consistency is often an appropriate choice.
To measure that assessment to be practical, The test must be not
excessively expensive, stays within appropriate time constraints, is relatively easy
to administer, and has a scoring/evaluation procedure that is specific and
timeefficient. A test that is prohibitively expensive is impractical' A test of
7
language proficiency that takes a student five hours to complete is impractical it
consumes more time and money than necessary to accomplish its objective. A test
that requires individual one-on-one proctoring is impractical for a group of several
hundred test-takers and only a handful of examiners. A test that takes a few
minutes for a student to take and several hours for an examiner to evaluate is
impractical for most classroom situations. A test that can be scored only by
computer is impractical if the test takes place a thousand miles away from the
nearest computer. The value and quality of a test sometimes hinge on such nitty-
gritty, practical considerations.
Authenticity is define as the degree of correspondence of the characteristics
of a given language test task to the features of a target language task. Brown
(2004:28) points that in a test, authenticity are presented in the following ways: the
language in the test is as natural as possible, items are contextualized rather than
isolated, topics arc meaningful (relevant, Interesting) for the learner, sonii thematic
organization to items is provided, such as through a story line or episode and tasks
represent, or closely approximate, real-world tasks.
The last one is washback. Washback refers to is the effect of the testing on
teaching and learning. The beneficial washback are positively influences what and
how teacher teaches, positively influences what and how learners learn, offers
learners a chance to adequately prepare, gives learner feedback that enhances their
language development, it is more formative in nature than summative, the last
provides conditions for peak performance by the learner. In designing a test with a
good washback the teachers should give the feedback to the students after
assessing their task. For example when the students’ performance a speech as their
speaking task, after they perform, the teacher should give some comments and
suggestion to increase students language ability.
8
Language teachers can become very effective as assessors without becoming
statisticians.
On the other hand, language teachers do benefit from at least a basic awareness
of statistics – of the principles that inform them, if not the details of the calculations.
This knowledge can help them to understand the meaning of external test scores as
well as helping them to improve the quality of the assessment materials they use and
to carry out effective classroom research.
There are some statistical concepts in interpreting the score of test; the first is
the normal distribution. A key statistical concept is used when interpreting test
scores in these ways: the normal distribution. The normal distribution is something
that is very widely observed in nature. If you look around you in any busy street you
will notice a few very tall people and a few who are quite short, but most will be
quite close to average height. Similarly, if you give a language test to people who
have all been studying for a similar length of time, their scores will tend to cluster
around the average or mean score for the group.
Second is using standardized scores in grading. A problem that often comes
up for teachers is that scores from different assessments have to be combined to
generate an overall result for a student over a term, semester or year’s work.
Standardizing scores offers one way of putting them all onto a comparable scale so
that they can aggregate in a relatively objective way. If the teacher wanted to base an
overall score for this semester on both the TEA and the EAT tests. If we just added
together the scores from the TEA (out of 100) and the EAT (out of 200). The EAT
would carry more weight in the final score than the TEA3. Standardizing the scores
equalises the contribution of each. On assessments with a relatively narrow spread of
scores (such as tests of speaking skills scored on a rating scale) an excellent
performance might be only a few points above an average performance while a good
grammar test score could be 20 points above the mean. Standardizing the scores
gives more credit to the excellent speaking score.
In testing reliability the teacher can used Carla’s score of 79% on the TEA test
is places her three quarters of the way along the horizontal axis, her score of 54% on
the EAT test puts her just above half way up the vertical axis. To compare Carla’s
performance on the two tests with those of another student – Luigi – we can add his
scores to the picture. He scored 35 on the TEA test and 17 on the EAT test. This
shows quite clearly that Carla has done better than Luigi on both tests
9
In other ways we also can use such as using cronbach alpha, ANOVA and
scale Likert in order to make an assessment valid, reliable, practical, authentic and
washback. Furthermore, statistical tool to assess reliability by using cronbach alpha.
According to (Bland, 1997), cronbach alpha is a test of internal consistency and
frequently used to calculate the correlation values among the answers on assessment
tool.
D. CONCLUSION
Based on the discussion in the study above, this article has shown, teachers
need to evaluate their assessment tools according to the five cardinal criteria for
test designer or doing test. The test will give us the information whether it is a
REFERENCES
Brown, J. D. (1996). Testing in language programs. Upper Saddle River, NJ: Prentice Hall
Regents
Brown, D. (2004). Language Assessment: Principles and Classroom. New York: Longman.
Brown, D. (2010). Language Assessment: Principles and Classroom. New York: Longman.
10
Hughes, A, (1993). Backwash and TOEFL 2000. Unpublished manuscript, University of
Reading.
Raymond, J.E., Homer, C. S.E., Smith, R., Gray, J.E. (2012). Learning through authentic
assessment: An evaluation of a new development in the undergraduate midwifery
curriculum. Elsevier Ltd. All rights reserved .doi.org/10.1016/j.nepr.2012.10.006
Straub, D., Boudreau, M.-C. & Gefen, D. 2004. Validation guidelines for IS positivist
research. Communications of the Association for Information Systems, 13, 380-427.
Sullivan, Gail M. MD, MPH, 2011, A Primer on the Validity of Assessment Instruments.
Journal of Graduate Medical Education, DOI: 10.4300/JGME-D-11-00075.1
11