You are on page 1of 3

ADVANCED ASS.

ON LANGUAGE TEACHING

BASIC PRINCIPLES OF ASSESSMENT

PAPER

Created by:

ABDUL ROHMAN SIDIK 201720560211027

MASTER OF ENGLISH LANGUAGE EDUCATION

UNIVERSITY OF MUHAMMADIYAH MALANG

2018
I. Introduction
Here in this working paper explores how principles of language assessment can and should
be applied to formal tests, but with the ultimate recognition that thess principles also apply to
assssments of all kinds. this explanation, these principles will be used to evaluate an existing,
previouly published, or created test.
How do you know if a test is effective? for the most part, that question can be answered by
responding to such questions as: Can it accurately measure what you want it to measure? these
and other questions help to identify five cardinal criteria for "testing a test" :Practicality,
reability, validity, authenticity, and washback. we will look at each one, but with no priority
order implied in the order of presentation

1. Practicality
An effective test is practical. This means that it:
 is not excessively expensive
 stays within appropriate time constraints
 is relatively easy to administer, and
 has scoring/ evaluation procedure that is specific and time-efficient.
A test that is prohibitively expensive is impractictical. A test of language proficiency that
takes a students five hours to complete is impratical-it consumes more time (and money) than
necessary to accomplish its objective. A test that requires individual one-on-one proctoring is
impractical for a group of several hundred test-takers and only a handful of examiners. A test
that takes a few minutes for a student to take and several hours for an exminer to evaluate is
impractical for most classroom situations.
A test that can be scored only by computer is impratical if the test takes place a thousand
miles away from the nearesr computer. The value and quality of a test sometimes hinge on such
nitty-gritty, practical considerations. It refers to the logistical, down-to-earth, administrative
issues involved in making, giving, and scoring an assessment instrument. These include "cocts,
the amount of time it takes to construct and to administer, ease of scoring, and ease of
interprenting/ reporting the reesults (Mousavi, 2009)

2. Reliability
A reliable test is consistent and dependable. If you give the same test to the same student
or matched students on two different occasions, the test should yield similar results. The issue
of reliability of a test may be addressed by considering a number of factors that may contribute
to the unreliability of a test. Consider the following possibilities (adapted from mousavi, 2002,
p.804) fluctuations in the student, in scoring, in test administration, and in the test itself.
a. Student-Related Reliability
The most common learner-related issue in reliability is caused by temporary illness, fatigue,
a "bad day" anxiety, and other physical or psychological factors, which may make an
"observed" score deviate from one's "true" score. Also included in this category are such factors
as a test-taker's "test-wiseness" or strategies for efficient test taking (Mousavi, 2002, p.904)
b. Rater Reliability
Human error, subjectivity and bias may enter into the scoring process. Inter-rater reliability
occurs when two or more scores yield consistant scores of the same test, possibly for lack of
attention to scoring criteria, inexperience, inattention, or even preconceived biases. In the story
above about the placement test, the initial scoring plan for the dictations was found to be
unreliable-that is, the two scores were no applying the same standards.
Rater-reliiability isssues are not limited to coontexts where two or more scores are
involved. Intra-rater reliability is a common occurrence for classroom teachers because of
uncleare scoring criteria, fatigue, bias toward particular "good" and "bad" students, or simple
careless. When I am faced with up to 40 tests to grade in only a week, I know that the standards
I apply-however subliminally-to the first few test will be different from those I apply to the last
few. I may be "easier" or "harder" on those first few papers or I may get tired, nd the result may
be an inconsistent evaluation across all tests. One solution to such intra-rater unreliability is to
read through about half of the tests before rendering any final scores or grades, then to recycle
back through the whole set of tests to ensure an even-handed judgment. In tests of writing
skills, rater reliability is particularly hard to achieve since writing proficiency involves
numerous traits that are difficult to define. The careful specication of an analytical scoring
instrument, however, can increase rater reliability (J. D. Brown, 1991)
c. Test Administration Reliability
Unreliability may also result from the conditions in which the test is administered. I once
witnessed the administration of a test of aural comprehension in which a tape recorder played
items for comprehension, but because of street noise outside the building, students sitting next
to windoes cound nt hear the tape accurately. This was a clear case of unreliability caused by
the conditions of the test administration. Other sources of unreliability are found in
photocopyying variations, the amount of light in different parts of the room, variations in
temperature and even the condition of desks and chairs.
d. Test Reliability
Sometimes the nature of the test itself can cause measurement errors. If a test is too long,
test-takers may become fatigues by the time they reach the later items and hastily respond
incorrcetly. Timed tests may discriminate against students who do not perform well on a test
with a time limit. We all know people (and you may be included in this category) who "know"
the course material perfectly but who are adversely affected by the presence of a clock ticking
away. Poorly written test items (that are ambiguous or that have more that one correct answer)
may be a further source of test unreliability.

You might also like