Professional Documents
Culture Documents
In Chapter 4 we argued that the topical knowledge of language users is always involved in
language use. It follows that if language test tasks are authentic and interactional, and elicit
instances of language use, test takers topical knowledge will always be a factor in their test
exclusively as a potential source of test bias, or invalidity, and the traditional practice in
developing language tests is to design test tasks that will minimize, or control, the effect of
test takers topical knowledge on test performance. We take a slightly different view and
would argue that this is not appropriate for all situations. Although topical knowledge may, in
many situations, be a potential source of bias, there are other situations in which it may, in
fact, be part of the construct the test developer wants to measure. One question that needs to
be asked, then, is when does the test developer consider topical knowledge a potential
source of bias and when does he define it as part of the construct? thin question is addressed
in section 1 in this chapter, under making inferences . it is in such situations that the test
developer is most likely to define topical knowledge in the construct and what the possible
We believe that there are essentially three options for defining the con struct to be measured,
2. include both topical knowledge and language ability in the construct definition, or
Here we will discuss these three options for defining the construct, along with typical
situations, intended inferences, potential problems, and possible solutions for each. We will
list the considerations associated with each option in more or less outline form, and then
Typical situations
Where test takers are expected to have relatively widely varied topical knowledge:
- language programs, where inferences about language ability are to be used to make
inferences about language ability will be one of several fact ors considered in the
selection process
It is important to remember that even if the test developer has decided not to include topical
knowledge in the construct definition, it may still be involved in test takers performanceb
For a particular testing situation, the test developer may simply choose to focus on language
Intended inference
Potential problem I
Possible bias due to specific topical information in the task input; test takers who happen to
Possible solutions
1. Include in the task input topical information that we expect none of the test takers to
know. This is a widely used solution in language tests. Potential problem: this
decontextualizes the test task, and thus probably biases the test against everyone to
some extent.
2. Include in the task input topical information that we expect will be familiar to all of
the test takers. Potential advantage: this contextualizes the task for everyone, and thus
aimed at assessing compre hension: test takers may be able to answer the questions
3. Present test takers with several tasks, each with different topical cont ent. There are
We favor providing test takers with choices, since this is one way in which we can
accommodate their specific interests. In addition, we feel that this gives them a greater
Possible ambiguous interpretation of low scores, and hence problem of questionable construct
validity. Specifically, if test takers get low scores, this could be due either to low language
Possible solutions
1. In test tasks that engage test takers in listening or reading activities, and that elicit
language ability that each test task is designed to measure, and then design test tasks
2. In test tasks that engage test takers in speaking or writing activities that elicit extended
production responses, use analytic rating scales for rating components of language
ability, or conduct qualitative analyses of the test takers responses. The use of
Example 1
university to read academic material in English for purposes of selecting them for
admission. Bearing in mmd the variety of disciplines in any given university, we might
decide to develop a number of reading passages with different topical information and allow
test takers to choose several to which to respond. We would need to design types of test tasks
that focus on areas of language ability, and then write tasks for each of the different reading
passages.
We would most likely be interested in the students ability to control certain forms of written
discourse and not the extent of their knowledge of the topic about which they happened to be
reading. We would therefore not include topical knowledge in the construct definition, and
would rate answers on the basis of components of language ability. (See Project 5 in Section
Three, in which different test tasks are designed to measure differ ent components of
language ability.)
Example 2
Suppose we wanted to develop an achievement test for students of the Thai language to
assess their ability to use the appropriate register with interIoc utors of different relationships,
ages, and social status in faceto-face oral interactions. We would most likely be interested in
measuring their know ledge of the apprpriate personal pronouns and politeness markers, and
not the extent of their knowledge of any particular information that might be the topic of the
interaction. We would therefore not include topical knowledge in the construct definition. We
might develop a set of role plays that dealt only with topics with which we expected the
students to be famil- jar. We might include role plays with a hypothetical superior, an office
associate, a close friend, a hypothetical subordinate, and a young child. Test takers responses
to the ro1eplay prompts and questions would be rated only in terms of the appropriateness of
Typical situation
Where test takers are expected to have relatively homogeneous topical knowledge:
- language for specific purposes programs, where the language is being learned in
professions, or vocations, and where inferences from test scores are to be used to
- selection for professional or vocational training programs, or for employ ment, where
scores from the language test will be the major factor cons idered in the selection
process. This involves using these inferences to make predictions about test takers
capability to perform future tasks or jobs that require the use of language.
Intended inference
Potential problem I
The test developer or user may mistakenly fail to attribute performance on test tasks to
topical knowledge as well as to language ability. This problem of inference may be due to
lack of clarity in the specification of test tasks and in the scoring of responses.
Potential problem 2
Test scores cannot provide specific feedback to the test user or to test takers, for diagnostic
purposes.
Possible solution
Clear specification of construct, definition for each test task and of criteria for scoring test
takers responses.
Example 1
Suppose we wanted to assess diplomats in the foreign service on their abili ty to read and
understand descriptions of political activities associated with elections in the foreign country
to which they are likely to be posted. We might design a set of tasks in which test takers are
presented with a number of passages representing the types of discourse normally found in
newspap er accounts. We might develop for each passage a set of multiple-choice items
short classroom lecture on the subject matter that she might be expected to teach, such as
educational evaluation. We might design a test task that simulates a lecture, and define the
construct as ability to give a we1Iorganizecl lecture on the topic of validation. (See the
Comments
to make inferences and predictions about indi viduals capability to perform tasks or jobs that
require the use of language, and pointed out the demands this places on the test developer,
who must clearly delineate the roles of individual characteristics, including topical
knowledge, in the construct definition. In Chapter 11 we point out some of the problems
involved in attempting to rate samples of spoken or written language with holistic rating
scales. Because of these problems, as well as those of potential lack of clarity in test task
specifications and lack of detailed feedback, we believe that the next option is preferable in
most situations in which the test user wants to make inferences about both topi cal knowledge
Where the test developer may not know whether test takers have relatively homogeneous or
relatively widely varied topical knowledge, and wants to measure both language ability and
topical knowledge. (This is very similar to Option 2, with the differences that (1) the test
developer has no expectat ions about test takers topical knowledge and (2) the test user wants
to be able to make inferences about both language ability and topical knowledge.):
- language for specific purposes programs, where the language is being learned in
professions, or vocations
- selection for vocational training programs, or for employment, where scores from the
language test will be the major factor considered in the selection process. These
inferences will be used to make predictions about test takers capability to perform
- research in which language ability and topical knowledge are included as constructs
Intended inferences
Less practical than Option 2 because it requires the two separate sets of test tasks, with
Example I
Suppose we wanted to make inferences about elementary school students ability to express,
in writing, their knowledge of how to use a dictionary. We could present them with a writing
prompt that requests them to explain to another student how to find the meaning of a word in
the dictionary. We could use separate rating scales to score their language ability and topical
knowledge. Thus, we could rate the composition in terms of areas of lang uage ability, such
addition, we could rate it in terms of acc uracy of content. This would potentially enable us to
make separate inferences about students control of areas of language ability in writing and of
topical knowledge.
Example 2
immigrant children entering an elementary school. For part of this test, we may want to
measure their ability to use English to understand written instructions, and include in the
construct knowledge of the vocabulary of the number system, dates, times, and scheduling.
In addition, we may want to measure their knowledge of and ability to perf orm the basic
we did not know in advance whether the children had topical knowledge of these
mathematical operations, we could develop two sets of test tasks that involve reading, each
The first set of tasks, focusing on language ability, could include Written instructions
about when to report for different activities. Test takers could be required to give short written
answers, consisting essentially of numbers, to simple questions about times for activities.
Scoring criteria would be whether or not the test takers provided the factually correct times.
The second set of tasks, focusing on topical knowledge, could include solving simple
division. Scoring criteria would be based upon whether or not the children could calculate the
Comments
In situations where the test developer knows very little about the test takers areas or levels of
topical knowledge this is probably the most justifiable approach in terms of validity of
inferences. However, while this may be so, it makes the greatest demands on resources and
test scores i a problem fundamental to all language tests. There are no easy solutions, and
there is certainly no universal solution for all testing situations. The particular solution that
the test developer arrives at will be a function of the various factors discussed above. What is
clear is that the test developer cannot simply assume, either explicitly or by default, that
topical knowledge need not be addressed simply because the focus of the test is on language
ability. It is equally clear that in all situa tions the language test developer needs to obtain as
much information as possible about potential test takers areas and levels of topical
knowledge, and should consult with content specialists in determining the areas of topi cal
content to include in the test and the accuracy of the information that is included.
Furthermore, in situations where the test developer/user wants to make inferences about test
takers language ability and areas of topical knowledge, it is crucial for the test development
team to include specialists in both language and the content area(s) to be assessed
In Chapter 4 of this book language skills (listening, included in the construct among the
language skills
we have taken the position that the familiar reading, speaking, and writing) should not
receptive)
and channel (audio or visual), we end up with skill definitions that miss many of the other
important distinctions between language used in particul ar tasks For example, suppose we
have two tasks. In one, test takers prepp are a speech in which they compare and contrast the
positions taken by two authors in two different stories. In the second task, test takers write a
two-line note to the mail carrier. Nominally, the first task is a speaking task, while the second
is a writing task. Yet these two tasks differ in many other ways as well. In the first task, the
length of the response is long, the language of the input is long and complex, the language of
the expected response is highly organized rhetorically and in a formal register, and the scope
of relationship between input and response is broad. In the second task, the length of the
response is short, there is no input in language form, the language of -the response is not
highly organized and is in informal register, and the scope of relationship between input and
response is narrow. Thus, focusing only on the difference between the so-called skills
(speaking these two alone are tasks. versus writing) misses much that goes into distinguishing
between language use tasks. That is, it suggests that the names of skills sufficient to define
Another reason for not including a specific skill in the construct defini tion is that, as
Widdowson (1978) has pointed out, many language use tasks involve more than one skill.
What he calls the communicative ability of conversation involves listening and speaking,
while what he calls cor respondence mvolves reading and writing. Therefore, rather than
defirnng a construct in terms of skills we suggest that the construct definition include only
the relevant components of language ability, and that the skill elements be specified as
2. Obtain a copy of a published language test and test manual. Read what ever mateal is
available dealing with the construct definition. How is the construct defined? What
the construct definition? Does the construct definition include the metacognitive strat
egies? How adequate do you find the Construct definition to be? How might you
revise it
3. Recall the last language test you developed or used. How did you define the construct
to be measured at the time? How might you define it now? if You did not define the
construct to be measured, how did that affect the usefulness of the test?
4. Has the discussion of defining the construct to be measured helped you distinguish
language ability? How? Do you find this distinct ion useful? Why or why not? What
to include charm acteristics of the TLU setting (such as ability to take an order in a
restaurant)?
5. Are there any testing situatiOns in wiich you feel it would be justified not to define
the construct to be measured? Why or why not? If we want to predict the ability of a
test taker to perform a specific TLU task and we know exactly what that task is, is it
Chapter summary
The process of test development begins with the preparation of a set of descriptions and
definitions. We begin by describing the test purpose, which in the case of a language test is
primarily to make inferences about language ability. Additional purposes may include using
these inferences to help us make decisions affecting test takers, teachers or supervisors, or
Next we describe the TLU situation and tasks so that we can design test tasks that
correspond in demonstrable ways to specific target language use situations and tasks,
enabling us to develop authentic test tasks. We start by identifying the TLU situation and
tasks We then describe these in terms of their distinctive characteristics. If we are designing a
test for use in a language instructional setting, we need to look beyond the instructional tasks
Next we describe the characteristics of the language users/test takers, which allows us
to develop test tasks that will be appropriate to the test takers. The characteristics of test
takers that we believe are particularly relevant to test development can be divided into four
categories:
1. personal characteristics,
2. topical knowledge,
Preparing this description will be relatively have in mind a specific TLU situation and wise, it
Finally, we need to define the construct to be measured so that we can know how to interpret
test scores. During the design stage, we define the construct, language ability, abstractly, and
we can base this definition on either the content of a language syllabus or a theory of
language ability. Although strategic competence will be involved in test performance, the test
developer may or may not want to make specific inferences about this component of
language ability, and thus must decide whether or not to include it explicitly in the definition
of the construct. Test takers topical knowledge will also be mvolved in test performance, and
thus the test developer must also decide on the nature of the inferences to be made, and define
the construct accordingly. There are essentially three options for this:
1. do not include topical knowledge in the construct definition, and make inferences
2. include topical knowledge in the construct definition, and make infer ences about the
ability to interpret or express specific topical informa tion through language, and
3. define language ability and topical knowledge as separate constructs, and make
Typical situations in which these three approaches might be appropriate are presented, along
language skills are not part of language ability, but consist of specific activities or tasks m
which language is used purposefully. We would therefore include only the relevant
components of language ability in the construct definition, and specify the skill elements as
Notes
1. Note to the teacher: This chapter deals with a number of important, related issues in
test design, which are relatively complex. For this reason, it is relatively long. One
way of teaching the material might be to assign each section to be read and worked
with individually. To this end, independent exercises and suggested readings are
be one and the same individual, as is typically the case for a classroom test developed
testing specialists for use by admissions officers in colleges and universities. Or the
test user might be one member of a team of test developers, as with a test developed
3. McNamara ib) states tnat tnese two types oi iuiereiies aic uascu on different
hypotheses. The first type of inference, about individuals ability to perform TLU
hypothesis, while he calls the second type, about the individuals ability to use
4. For stylistic reasons, from now on we will use the acronym TLU only where it is
essential for distinguishing TLU domains and tasks from test tasks. Thus, where the
meaning is clear from the context, we will refer simply to domain and task.
5. We would note here that the broader the TLU domain for which we are designing a
test, the more difficult it may be to design a useful test. This is because the problem of
sampling from the domain of TLU tasks becomes more difficult as ihe domain of
TLU tasks becomes larger and hence more diverse. Thus, with a very large, diverse
domain, both authenticity and the validity of inferences about test takers ability to
perform in the TLU domain may be compromised by practical restrict ions on the
TLU domain, language instructional TLU domain, we will refer to these simply as
7. We recognize that .our view that specific components of language ability need to be
delineated and clearly defined if the construct definition is not universally held among
language testers. Indeed, a fairly large number of language testers view language
ability as a holistic, unitary ability, and thus do not attempt to delineate different
components in the way they assess this. The most prominent example of this is the
view which. has informed the ACTFL Oral Proficiency Interview, and its predecessor,
the FSI oral proficiency interview (Lowe 1988). Suffice it to say that the authors do
not agree with this view, for reasons that have been widely discussed in the research
literature, and refer those readers who are interested in this debate to the relevant
8. At present the role of strategic competence in language use, and hence in language
test performance, is a relatively new area of research, and we cannot be more specific
about it. However, we believe tnat tuis is a promising approach m researching the
9. We would point out that allowing test takers the choice ot taking tasks with different
topical content introduces another potential source of inconsistency across test tasks,
and that the. test developer must take appropriate steps to determine the extent to