Professional Documents
Culture Documents
2. Continuous assessment it takes place during different stages of the course (ongoing) to monitor learners progress.
3. Formative assessment - the teacher is familiar with each and every learner, and
as we have seen above, can draw on a much wider range of evidence that informs
judgments about ability. The teacher interacts with each learner, and the purpose of the
interaction is not to assess in the most neutral, non-invasive way possible. Rather, it is to
assess the current abilities of the learner in order to decide what to do next, so that further
learning can take place. In traditional terminology, this makes classroom assessment
formative.
4. Summative assessment it is conducted at the end of a programme of study to assess
whether and how far individuals or groups have been successful.
4. Achievement testing it is intended to measure the takers level of proficiency
with no reference to any particular course.
Controlling factors in testing and evaluation (principles)
1. Reliability - the desired consistency (or reproducibility) of test scores is called
reliability. Whenever a test is administered, the test user would like some assurance that
the results could be replicated if the same individuals were tested again under similar
circumstances. It underpins four assumptions:
Stability: the abilities of the test takers will not change dramatically over short
periods of time.
Discrimination: tests are constructed in such a way that they discriminate as well
as possible between the higher ability and lower ability test takers. However, in the
classroom the teacher does not often wish to separate out all individuals and rank-order
them, as it serves no pedagogic purpose. Rather, the teacher wishes to know if any
individual has achieved the goals of the syllabus and can move on to learn new material,
or whether it is necessary to recycle previous material.
Test length: the more items or tasks are included in the test, the higher the
reliability coefficient will be. In the classroom there is very little reason to wish to spend
many hours having learners take long tests, because teachers are constantly collecting
evidence about their progress. Some task types, usually involving performance, take
2
extended periods of time, and yet these still only count as one task one piece of
evidence when calculating reliability.
Homogeneity: the items are related or correlated to each other. So each piece of
information is independently contributing to the test score, and the test score is the best
possible representation of the knowledge, ability or skills of the test taker.
2. Validity directs to the meaningfulness and appropriateness of the interpretations on the
basis of test scores.
3. Authenticity test tasks should reflect the language takers performance in real-life
situations, i.e. what learners can do in L2.
4. Interactiveness is defined as the extent and type of involvement of the test takers
individual characteristics in accomplishing the test tasks (learning style, amount and
quality of knowledge, strategic competence, etc).
5. Practicality the resources (human and material) required for test design and
implementation. In other words, it is a question of whether the test is developed and used
at all.
Techniques for alternative assessment1: structured interviews, observing and recording,
unstructured interviews, creating portfolios, open group discussions, completing selfreflection forms, brainstorming groups, writing essays to single prompts, keeping a
journal, recording peer evaluation, open questions, etc.
Integrative tests vs. discrete-point tests - although it has been argued that integrative
type tests must be used to measure communicative competence, it seems that discretepoint tests will also be used in the communicative approach. This is because such tests
may be more effective than integrative tests in making the learner aware of and in
assessing the learners control of the separate components and elements of
communicative competence. This type of test would also seem to be easier to administer
and score in a reliable manner than is a more integrative type of test. For example, a test
designed to assess grammatical accuracy might be considered to have more of a discretepoint orientation.
1
The washback effect - the term describes the effect that tests have on what goes on in
the classroom, on the teaching and learning processes. For many years it was just
assumed that good tests would produce good washback and inversely that bad tests
would produce bad washback. More recently, the assumption has been challenged as
bad tests can produce good effects: teachers and learners do good things they would
not otherwise do: for example, prepare lessons more thoroughly, do their homework, take
the subject being tested more seriously, and so on.
Criterion-referenced assessment (CRT) vs. norm-referenced assessment (NRT)
CRT represents a test that measures knowledge, skill or ability in a specific domain.
Performance is usually measured against some existing criterion level of performance
(successful performance standards), above which the test taker is deemed to have
achieved mastery. NRT represents a test in which the score of any individual is
interpreted in relation to the scores of other test takers. Once the test is used in live
testing, any score is interpreted in terms of where it falls on the curve of normal
distribution established in the norm-setting study.