You are on page 1of 40

Chapter

4
Preparing Instructional
Design Objectives and
Assessment Strategies

The great aim of education is not knowledge but action.


—Herbert Spencer

Chapter 4 Topics
▪ The role of objectives in instruction and instructional design
▪ Essential characteristics and components of instructional design objectives
▪ How to decide on appropriate assessment formats for various types of objectives
▪ Procedures for writing effective instructional design objectives
▪ Common errors and problems in writing instructional design objectives

Chapter 4 Learning Outcomes


1. Identify the roles that instructional design objectives play in a systematic design process.
2. Analyze instructional design objectives for missing characteristics and components.
3. Apply criteria for selecting appropriate assessment formats for given situations.
4. Identify and sequence the steps required to write effective instructional design objectives.
5. Apply criteria for effective instructional design objectives by correcting objectives that are not
stated appropriately.

69
70 Part I  •  Analysis

SCENARIO
The essential role of objectives and assessments
Aubrey Fair was an instructional designer for a large training consultant firm. A manu-
facturing company had hired his firm to update training in antitrust laws that it re-
quired all its managers to take. It was imperative that the company’s managers knew
these laws well and didn’t inadvertently break any antitrust rules, because the com-
pany would be held responsible for any infractions. For the last few years, a workshop
was offered by the company’s lead attorney, who was an expert in antitrust laws.
­Aubrey was not told what needed to be updated; he was simply instructed to begin by
meeting with the attorney.
Aubrey greeted the attorney cordially. “So how long have you been offering
these antitrust workshops?” he asked. “For about two-and-a-half years,” replied the
attorney stiffly, “and, frankly, I don’t see why we need to change them at all. They’ve
been working just fine up to now.” Aubrey immediately sensed the need to tread care-
fully. He was obviously invading the attorney’s domain, but he needed to know how
the training had been held in the past.
“Yes, I’ve heard you’re the company’s legal expert, and I’m very interested in
the approach you’ve been using in your workshops,” said Aubrey amiably. “Can you
share some of your materials with me? I’m especially interested in the objectives of
the workshop.”
“They’re quite straightforward, as you can see,” said the attorney who handed
him a notebook. “This is my instructor manual and handouts.” Aubrey read a few of
the statements on the list labeled “Workshop Objectives.” They included:
• Review the definition of “antitrust” as reflected in the basic laws.
• Review the purposes and main points of each of the laws.
• Give students an appreciation for the purposes of antitrust laws in business.
• etc.

Aubrey said, “Hmmm, I see. How do you tell if the workshop participants learn
what you have in mind? Do you have tests or assessments to measure what they’ve
learned?”
The attorney said, “Oh, yes, I can always tell they really get it. There are always
a lot of good questions, and everyone is very enthusiastic about the content. I always
have one or two come up afterward to shake my hand and tell me they’re glad they
attended. These are high-level guys, though, and I feel they would find tests demean-
ing. We do a debriefing at the end and go through a checklist together to make sure
they know everything they should. It really works well.” Aubrey thanked the attorney
and asked to take the notebook with him to look over.

BACKGROUND ON OBJECTIVES AND ASSESSMENT STRATEGIES


IN INSTRUCTIONAL DESIGN
A foundational concept of instructional design is that effective instruction not only
results in learning, which is an internal, unseen change in learners, it also makes pos-
sible a change in action or performance, which is an external, observable change. To
paraphrase the opening quote by Spencer, the desired result of instruction is action
as well as knowledge. In the exchange between the attorney and the instructional de-
signer scenario at the beginning of this chapter, the designer was expecting to see the
Chapter 4  •  Preparing Instructional Design Objectives and Assessment Strategies 71

workshop objectives as statements of actions that learners have to do to demonstrate


that they have achieved the intended results from the instruction. He also was looking
for clear, observable indicators that the attorney used to determine that workshop par-
ticipants had met the objectives. Instead, the attorney gave the designer a list of teach-
ing activities that he, the attorney, did during the workshop. Learning the workshop
content on antitrust laws was important to the participants; it should make them famil-
iar enough with situations that they might encounter where antitrust rules applied so
that they would not inadvertently break any of these rules. The attorney-instructor felt
that the workshop participants really “got it.” But how could he tell?
Statements of objectives are most helpful when they communicate clearly and
unambiguously the actions students are to do to show they have learned. Although
this may sound like a commonsense approach, it is no easy matter to write such clear,
specific objectives, even for those who are experts in the content. This section begins
by reviewing the key role that instructional design objectives play in a systematic
design approach, reviewing essential characteristics and components of these objec-
tives, and giving examples of them in a variety of different content areas.

 isten to Learn How Writing Objectives and Planning Assessment Strategies


L
Improves Instruction

A Review of Instructional Roles for Objectives


Before describing essential characteristics of good objectives and how to go about
writing them, this section provides background on the purposes that objectives serve
in instruction and why they have been called by various names in the past. It also dif-
ferentiates objectives from another term used to describe student performances that
instruction enables: standards.

PURPOSES OF OBJECTIVES IN INSTRUCTION.  As Waugh and Gronlund (2013) observe,


objectives play a key role in both instruction and assessment. “By describing the
performance that we are willing to accept as evidence of achievement, we provide a
locus for instruction, student learning, and assessment. Objectives help keep all three
in close harmony” (p. 35). Objectives like the ones at the beginning of textbook chap-
ters serve as “learning guides” for students. Readers use them not only to focus on
information they are to derive from the chapter, but also how they will be required to
demonstrate that they understand it.
Objectives that are written in a more detailed and precise way than other kinds
of objectives serve an essential role for instructional designers. After designers write
objectives, they focus all subsequent design activities on making students able to
demonstrate the behaviors described in the objective statements. Thus, objectives
serve as a framework for creating the instructional materials. Objectives also provide
criteria by which designers and others judge the quality of instruction. If students
can do the activities described in the objectives, the instruction is deemed to be suc-
cessful. If they cannot do the activities, the instruction is considered to be in need
of revision.
Robert Mager’s 1962 foundational book Preparing Instructional Objectives not
only described how to write clear, unambiguous instructional objectives, or postin-
struction actions students must be able to demonstrate to show they have achieved the
intended results from the instruction, it also made the practice very popular in educa-
tion and training. The 1970s found school districts and other institutions engaged in
writing instructional objectives for every topic they taught. However, many of these
72 Part I  •  Analysis

organizations never created actual assessments linked to these objectives or made sure
that instruction was in place to help bring about the outcomes they specified. There-
fore, the most important role of instructional objectives was never served. If objectives
are to be most useful in improving instruction, they are not ends in themselves, but
rather the first of in a series of carefully linked design activities.

STANDARDS VS. OBJECTIVES.  In the last decade, content standards are one kind
of performance “target” that has become increasingly well known and important
in education and training. For example in the United States, every state has a
set of standards for what students are to learn in each content area. In addition,
Common Core Standards have been created by the National Governors Association
Center for Best Practices and the Council of Chief State School Officers (http://
www.­corestandards.org). At this time, 45 states and the District of Columbia, four
territories, and the Department of Defense Education Activity have adopted the
Common Core Standards. While these are definitely statements of what students
should be able to do after instruction, they are more global in nature than those
required for instructional design purposes. For example, look at the following com-
parison between one of the Common Core Standards and three different objectives
that might be designed to measure achievement of that standard. In Figure 4.1, see
how a single standard can be assessed in many different ways with different actions
and criteria for meeting it.

TERMS FOR OBJECTIVES.  Various terms have been used to describe the behaviors
s­ tudents should be able to do as the result of instruction. These include: behavioral
objectives, instructional objectives, objectives, outcomes, outcome-oriented objec-
tives, and performance objectives. However, all of these terms are used in contexts
other than systematic instructional design, and the meaning becomes clear only if
the reader knows the context and purpose for which they are being used. The term
­instructional design objective is used in this design model to clarify that it is the prod-
uct of this instructional design step: a statement of behaviors and assessment criteria
that instructional designers write to specify what learners should be able to achieve as
a result of the instruction. This term also helps differentiate statements of objectives
that are useful for design purposes from those given to students or stated in textbooks,
because the latter may not be as detailed or stated in the same way as those needed
to drive instructional design.

Common Core
Standard for Grade 4
Language Arts Measureable Performance Objectives for the Standard
L.4.5 Explain the meaning 1. In at least 8 of 10 sentences that each contain an underlined simile
of simple similes and met- or metaphor, write below the sentence a synonym for the figure of
aphors (e.g., as pretty as a speech.
picture) in context. 2. Given a 6- to 8-sentence paragraph containing a total of 2 similes
and 2 metaphors and a list of meanings for them below the
paragraph, circle all 4 figures of speech and write each beside its
correct meaning.
3. In 10 short poems, 5 of which contain a simile and 5 of which contain
a metaphor, identify the figure of speech correctly in at least four of
each set by circling it and writing it s meaning below the poem.

FIGURE 4.1  Example standard and performance objectives matched to it.


Chapter 4  •  Preparing Instructional Design Objectives and Assessment Strategies 73

Check Your Understanding 4.1

Objective 1 Exercise—Roles for Instructional Objectives. Place a check by each of the


following that are roles that instructional design objectives should serve:
______ 1. Serve as learning guides for textbook readers
______ 2. Serve as a framework for creating the instructional materials
______ 3. Provide criteria by which designers and others judge the quality of instruction
______ 4. List all the required steps in presenting an instructional sequence
______ 5. Serve the same role as standards such as the Common Core Standards
______ 6. Communicate clearly actions students are to do to show they have learned
______ 7. Focus design activities on making students able to demonstrate stated behaviors

Click here for suggested answers

Essential Characteristics and Components


of Instructional Design Objectives
While all designers agree that objectives provide an important foundation for instruc-
tional design, several formats for objectives have emerged over the years, all with the
purpose of making the outcomes of instruction clear and unambiguous for design
purposes. The most popular formats are: ABCD, Gagné, and Mager, and they disagree
only in the number of components required to make this unambiguous quality pos-
sible. A comparison of these formats is shown in Table 4.1.
These formats were developed by instructional design experts in 1962 (Mager,
three components), 1968 (ABCD, four components), and 1974 (Gagné & Briggs, five
components). However, as the examples in Table 4.1 show, there was still consider-
able overlap among them. Each one included components they felt would clarify the
outcomes and make the statements most useful to designers. However, many design-
ers report difficulty in making all outcomes fit a given format. There seems to be no
one-size-fits-all method of specifying what should be included. And, although some
instructional design models call for writing objectives before identifying assessment
strategies, experience has shown that designers cannot really prepare objectives with-
out considering assessment methods. They develop the actual assessment materials
later, but deciding how assessment will be done is inextricably connected with how
students will demonstrate what they have learned.
Therefore, objectives and assessments should be considered together, and the
format of the objective depends in large part on the type of learning outcome. In
order to be clear statements of what students are to do, some objectives may require
only three components and others four or five components. The purpose that design-
ers must keep central in their minds is to make objectives communicate clearly. To
do this, all objectives should have the essential characteristics and components dis-
cussed in the following sections. How to go about writing statements that meet these
criteria is described later in this chapter under the section Preparing Instructional
Design Objectives.

ESSENTIAL CHARACTERISTICS.  No matter how they are stated, instructional design ob-
jectives should reflect certain qualities. First, there should always be an observable
action of some kind (e.g., write, create) rather than just an internal ability (e.g., under-
stand, know, learn) or a statement of content to be covered (e.g., review three chap-
ters). Second, the focus should always be on the actions of students after instruction,
74 Part I  •  Analysis

TABLE 4.1  Three Popular Formats for Instructional Design Objectives

  ABCD Format Gagné Format Mager Format


Developed by Instructional Robert Gagné and Robert Mager (1962)
Development Institutes, Leslie Briggs (1974)
1968 (Seels & Glasgow,
1998)
Components Four parts: Audience Five components: Three components:
(type of students), Characteristics of the Behavior (action
Behavior (action verb), stimulus situation, verb), conditions, and
Conditions, Degree learned capability verb, criteria for judging
(criteria for judging object of verb, action performance
performance) verb, special conditions
Comparison of A = participants in a Stimulus = given a Behavior = create
Components preconference web sheet of paper that correctly working links
design workshop designates five text and Conditions = given
B = make text and graphic items and what a sheet of paper that
graphic items into links they are to link to and an designates five text
C = given a sheet of on-screen web page in and graphic items
paper that designates an editor software that and an on-screen
five text and graphic contains all the items web page in an editor
items and what they Capability verb = software that contains
are to link to and an demonstrate all the items
on-screen web page in Object of verb = Criteria = for all
an editor software that correct procedure for five items
contains all the items making links
D = correctly working Action verb = creating
for five designated items links
Conditions = correctly
working in all five items
Comparison When given (1) a sheet When given (1) a sheet When given a sheet
of Resulting of paper with a list of of paper with a list of of paper with a list of
Objective five text and graphic five text and graphic five text and graphic
Statements items and what they items and what they items and an on-
are to link to, and (2) a are to link to, and (2) a screen web page in
web page in an editor web page in an editor an editor software
software that contains software that contains that contains all the
all the items, participants all the items, the student items, the learner
in a preconference web will demonstrate correct must create correctly
design workshop will procedure for making working links for all
make five designated links by creating correctly five items.
text or graphic items working links for all five
from the page into items.
correctly working links.

rather than those of the teacher or student during instruction. Finally, statements
should be so unambiguous that anyone reading them should know exactly what stu-
dents are to do to show they have learned. It is not necessary to state an objective in
only one sentence. Clarity and specificity are the most important qualities for instruc-
tional design objectives, and achieving these qualities may require several sentences
or a series of phrases.
Chapter 4  •  Preparing Instructional Design Objectives and Assessment Strategies 75

ESSENTIAL COMPONENTS.  Objective statements are most helpful for design purposes
when they have certain components. At the minimum, each statement should contain
three items to specify how the student will demonstrate what they have learned: ac-
tion, assessment, and performance level.
• Action.  The action the student is required to do is derived from the behavior
identified in the learning map, which designers create in the step before writ-
ing objectives. (See Chapter 3.) For example, one of the outcomes in a 3-D
Drawing Sample Project learning map in Chapter 3 is “Complete 3-D drawing
model.” The obvious action that would demonstrate knowledge of drawing
principles is: “Draw a model.” Actions should always be expressed as observ-
able activities; for example, “design, write, solve, draw, make, choose.” Avoid
action verbs that ­describe internal conditions that cannot be directly seen and
measured. Examples of these “verbs to avoid” are: understand, know, appreci-
ate, and feel.
• Assessment.  The designer must identify the circumstances under which the
student will complete the action. This may include methods, student/instructor
materials, and/or special circumstances that will apply as students show what
they have learned. Many objectives do not require that all four of the following
components be specified in order to make an objective clear enough for design
purposes; it depends on the type of learned behavior and what the designer con-
siders necessary for a valid assessment.
– Methods.  The objective should identify the means of assessing the action.
Completing a test or survey, doing a verbal description, performing an activity,
or developing a product all are possible assessment methods. (For details on
assessment method options, see the following section on Essential Criteria for
Selecting Appropriate Assessment Methods.)
– Student materials.  Assessment may require that students have additional
materials such as data charts and tables, calculators, dictionaries, or textbooks
available to them. If so, the objective should state them.
– Instructor materials.  Materials such as a rubric or a performance checklist
may be required so that instructors can rate or track performance. A rubric is
a scoring guide, and a performance checklist is a list of component tasks or
activities in a performance. (Both will be discussed in more detail in Chapter 5.)
For example, if students do web page layouts, the products might be judged
by a rubric or criterion checklist.
– Special circumstances.  Sometimes the objective must include a description
of certain conditions in which the assessment will be done. For example, stu-
dents must do an activity within a certain time limit or without any supporting
materials.
• Performance level.  Perhaps the most difficult part of writing an objective is
specifying how well a student will have to do an activity or how much they must
do it to show they have the necessary level of expertise. Designers must decide
what will constitute acceptable performance and specify it. Depending on the
assessment method, there are several ways to express acceptable performance
levels.
– Number correct.  Students may need to do a certain number of items or ac-
tivities correctly to demonstrate they have learned. If the assessment method
is a written test, the percentage or number of items required for passing the
test should be stated. If the action is a motor skill such as operating a piece
of equipment, the students may need to do it correctly a certain number of
times.
76 Part I  •  Analysis

– Level of accuracy.  If the designer knows there will be variation in the action,
the tolerance for this variation should be specified. For example, if an architec-
tural student is required to calculate the weight a structure will bear, a tolerance
range in pounds or ounces must be stated.
– Rating.  If the quality of performances or products is measured by a rubric or
checklist, the acceptable rating must be given. For example, in the web page
example, if a rubric is used to assess the quality of the student’s web page de-
sign, the designer would have to specify what would constitute an acceptable
rubric score. If students are to complete a series of activities, the rating may be
how many of the total number they must complete.
See Table 4.2 for examples of objectives that reflect all these components.

TABLE 4.2  Sample Instructional Design Objectives with Essential Components

1
Target Behavior 2 3 4
  from Learning Map Action Assessment Performance Level
Example 1 The student identifies The student labels The student labels a sample page The student must
examples of text, elements of a web printout randomly selected by correctly label 14 of
images, links, and page. the Instructor from 10 printouts. 15 elements.
tables on a web page. On each page, 15 elements are
indicated with an arrow and a
numbered line. The student must
label all parts within 10 minutes.
(Spelling does not count.)
Example 2 The student classifies The student The paragraph on a computer At least 14 of the
sentences as identifies sentences screen has 15 sentences 15 must be correctly
simple, compound, in a paragraph as with at least 2 of each type coded.
complex, or to type. represented. The instructor assigns
compound–complex. a color code for each type. Student
codes all 10 of the sentences within
10 minutes.
Example 3 The student The student creates On an AutoCAD screen, the The roof drawing
demonstrates the a CAD drawing of a student draws a roof with the must meet at least
procedure for using structure with 3-D correct size and shape within ten 9 out of 10 accuracy
AutoCAD to create a planes. minutes and with no reference and quality criteria
3-D plane in space. materials. The instructor grades on the instructor
with a checklist. checklist.
Example 5 The student states The student On a computer-screen image of Passing score is at
names for all bones labels bones on a the upper extremity, students enter least 61 of 64.
of the shoulder, wrist, computer-generated the name of the bone on the line
and hand. image of the opposite it, all within 30 minutes,
skeleton. using no reference materials;
spelling counts.
Example 6 The student executes The student types The student uses Microsoft Word The paragraph must
a typing exercise at a paragraph at software to type an assigned contain no more than
60 WPM. 60 WPM. paragraph. If needed, the three typographical
instructor will assist with setting up errors.
a new Word document. Students
will be given the paragraph on
paper and a verbal signal to begin
and end the test.
Chapter 4  •  Preparing Instructional Design Objectives and Assessment Strategies 77

Check Your Understanding 4.2

Objective 2 Exercise—Characteristics and Components of Instructional Design Objectives.


In each of the following statements, identify which required characteristics and components
it lacks. Place the letter(s) of the missing characteristic or component (listed on the right) on
the line next to the statements on the left. (NOTE: Some statements will be missing more than
one characteristic or component.)
Required Characteristics/
Incorrect Objective Statements Components
______ 1. Create a 5-minute video using Adobe Premiere A. The target behavior is
software; the video product will be evaluated stated.
by a rubric. B. The behavior is
______ 2. Identify the appropriate IRS form for given observable.
­tax-reporting needs by selecting the form C. It is stated as a student
name/number from a list of possible forms (not teacher) action.
in an online testing program. D. It is a postinstruction
______ 3. Teach the correct protocol for setting a ­broken behavior.
arm by demonstrating the operation on a E. It communicates clearly.
­patient mannequin. F. It contains an assessment
______ 4. Know how to write a subroutine in the C++ strategy.
programming language. G. It contains a performance
______ 5. Correctly dissect a frog using a computer level.
­simulation. The operation will be graded by
a performance checklist.

Click here for suggested answers

Essential Criteria for Selecting Appropriate Assessment Methods


Because assessment methods are an important component of instructional design
objectives, writing objectives requires designers to select from the array of available
assessment techniques. This section reviews the characteristics and uses of each
type of assessment method and factors to consider when selecting an assessment
method. Chapter 5 describes in detail how to prepare effective assessment materials
of each kind.

MENTAL SKILLS AND INFORMATION TESTS.  In recent years, test formats called mental
skills and information tests (or simply tests) long used in education and training (e.g.,
multiple-choice tests) have come under various kinds of criticism. These are instru-
ments consisting of individual items that are intended as indirect measures of student
abilities. Some educators feel the instruments are overused and are valid measures
of learning primarily for lower-level skills. However, tests remain the most com-
monly used assessments in education and training, and many educators feel that, when
properly applied and developed, they can effectively assess learning at many different
levels. Although most of these methods require a relatively simple external response
from the student, they can require a complex internal process. For example, the multiple-
choice example in Table 4.3 requires only that the student read the item and circle
a choice. However, in order to get the item correct, the student must first solve
a complex problem. Another criticism of true/false, multiple-choice, and matching
formats is that students can get some correct by guessing. However, several tech-
niques are used to address this potential problem. For example, in a multiple choice
78 Part I  •  Analysis

test, designers may require a number of correct items and can provide carefully
crafted wrong answers or distractors based on answers that can result from incorrect
processes.

PERFORMANCE MEASURES.  Checklists and rubrics have gained popularity in recent


years as they became associated with constructivist teaching methods and “authen-
tic assessment,” or requiring a behavior that simulates a real-world application of a
learned ability. For example, students may demonstrate a combination of mathemati-
cal and problem-solving skills by developing a solution to a scenario, or they may
work in a small group to create a multimedia presentation to show results of research
and skills in working cooperatively with others. Another popular approach is student
portfolios, collections of people’s work products over time, arranged so that they and
others can see how the person’s skills have developed and progressed. Instructors use
checklists, rubrics, or a combination of these to rate complex work or products. Sev-
eral organizations have developed and tested checklists and rubrics to support their
own activities and have offered them for use by others with similar needs. Examples of
some of these materials may be found in the Instructor Manual and, when applicable,
may be used by students in their instructional design projects.

ATTITUDE SURVEYS.  When the objective of the instruction is to change students’ per-
ceptions or behavior, Likert scales or semantic differentials ask them how they feel
about a topic or what they would do in a given situation. Likert scales are assessments
that ask the degree to which one agrees with statements, and semantic differentials
are assessments that ask where one’s views of something fall between a set of bipolar
adjectives. (Both will be discussed in more depth in Chapter 5.) Of course, we can
never be certain that what students say they will do on attitude measures is what they
actually will do. For example, a survey found a disconnect between what students say
they want to eat and what university food-service managers observed them choosing
to eat. The students said they wanted to eat healthy food like salads and fruit; how-
ever, the most popular foods were pizza and hamburgers (Farrell, 2002). Because most
actions cannot be observed so directly, attitude measures remain the most useful ways
to infer students’ likely performance and, thus, indicate that the instruction has had
the desired impact.

FACTORS TO CONSIDER WHEN SELECTING ASSESSMENT METHODS.  Which assessment


method fits a given objective? As with many instructional design activities, designers
must use guidelines rather than formulas to answer this question, and there is usually
more than one correct strategy. Consider the following four guidelines when selecting
a type of instrument to use:

• Guideline #1: Directness of measurement.  What is the most direct way to


determine if the student can do the desired performance? Very often, assess-
ments must use indirect strategies because it is not practical to do direct obser-
vations of student performances. For example, after instruction, you may want
to give someone a sales report and ask them to give an analysis of its important
information. But because assessment must be faster and easier to accomplish,
you have to choose a less direct method: asking them specific questions about
the report, each of which has one correct answer. You may want to see if a per-
son’s attitude toward a topic has changed. The most direct method is watching
his or her behavior over time to see what choices they make. Again, because
Chapter 4  •  Preparing Instructional Design Objectives and Assessment Strategies 79

Four guidelines can


help determine how
best to measure what
students learned from
instruction.

this is not feasible, you must choose a less direct method: asking them ques-
tions about what they will do in the future. In circumstances where there are
many learners to assess and time is an important factor, most assessments must
be indirect measures. However, the idea is to choose the most direct way that
is also logistically feasible to carry out in the setting in which instruction will
take place. When confronted with more than one way to assess individuals in-
directly (e.g., a matching versus a multiple-choice test), choose the one that is
the most direct measure of the performance learners would do in “real-world”
environments.
• Guideline #2: Resources required to establish reliability and validity. 
Designers must also make decisions based on their estimates of time and person-
nel resources it will take to make sure instruments are valid and reliable. Validity
means an assessment method measures what it is supposed to measure (Gay,
Mills, & Airasian, 2009; Oosterhof, 2009). Reliability means an assessment yields
consistent results over time, over items within the test, or over two or more scor-
ers. (Also see Chapter 5 for a more in-depth discussion of validity and reliability
when developing each type of instrument.)
– Validity.  For designers, validity means that an assessment should be
closely matched to the action stated in the objective. To increase validity,
designers try to select an assessment format that requires as little inference
as possible about whether students can actually do the action whenever they
are required to do it. For example, if the objective calls for students to solve
given algebra problems, a mental skills test that requires them to solve sam-
ple problems and indicate answers would be an appropriate way to infer stu-
dents’ skills in solving any and all such problems. However, if the objective
requires students to demonstrate they can analyze real-world situations and
develop complex solutions that require algebra skills, scenario-based prob-
lem solving evaluated by a performance measure such as a rubric or checklist
80 Part I  •  Analysis

would be more appropriate. One method designers frequently use to help


decide on assessments is analyzing the action for whether students should
select a response or construct a response (Popham, 2011). Sometimes, the
choice is obvious. For example, if the objective were for students to write a
well-developed paragraph, it would not be valid to ask them questions about
good paragraphs. However, many abilities can be sampled in a valid way
by having students select from possible responses. Deciding on the num-
ber of items or performances to establish competence must also be done at
this point.
– Reliability.  An assessment is reliable if it yields consistent results over time,
over items within the test, and over test scorers. When designers are develop-
ing tests, they are concerned with consistent measurement with an instrument
and over time. However, when selecting an assessment method, designers are
primarily concerned with a kind of reliability known as inter-rater reliability,
or the degree to which two or more persons scoring the same test are likely
to get the same score (Gay et al., 2009). Whenever answers can be scored
objectively (e.g., multiple choice, true/false, matching), scoring reliability is
high. Whenever the scorer has some latitude in determining correct answers
(e.g., short answer, essay, performance measures), scoring reliability is lower.
Designers must select a method that has as much potential as possible for inter-
rater reliability while still being a valid measure.
• Guideline #3: Instrument preparation logistics.  Valid, reliable assessments
take time to develop. Activities include: analyzing actions to determine instrument
requirements, writing items, creating scoring criteria, and collecting data to indicate
levels of validity and reliability. Depending on the available time and resources,
designers may wish to use existing assessments whose usefulness has already
been ascertained. For example, an organization may already have a multiple-
choice test or validated rubric. Although the designer would like to require a
different kind of test or different items or scoring procedures, there may be no
time to develop and validate them. The solution may be to adopt the existing
method while recommending that another measure be developed to implement
at a later time.
• Guideline #4: Administration and scoring logistics.  The relevance of con-
sidering the time and effort needed to administer and score assessments was
well illustrated in one statewide student testing program. The state had ad-
opted two measures to assess students’ mathematics and language skills: a
multiple-choice test and a performance test scored by rubrics. Although teach-
ers were hard pressed to find time to administer the performance tests, they
eventually collected data from both measures. However, when the time arrived
to score the assessments in order to make student placement decisions for the
next year, state education officials had to admit they lacked a sufficient number
of trained personnel to score the performance tests in time. They decided to
ignore all the carefully collected performance data and use only the less valid
but more easily scored multiple-choice tests. To avoid this kind of situation,
designers must analyze the time required for administration and scoring and
weigh this information against reliability and validity before selecting an as-
sessment format.
Table 4.3 summarizes the options available under each of the three major catego-
ries of assessment methods: mental skill and information tests, performance measures,
and attitude measures. It also gives an example item of each type. Table 4.4 summa-
rizes important issues to consider when selecting assessment formats.
Chapter 4  •  Preparing Instructional Design Objectives and Assessment Strategies 81

TABLE 4.3  Summary of Types of Assessment Methods and Instruments

Type of
Category Method Description Sample Action Sample Item
Mental Multiple choice Questions or “stems” with Identify correct answers 1. Which of the points listed
Skill and three to five alternative to geometry problems. below is on a circle with
Information answers provided for each. the following equation?
Tests Students select the most (x − 7)2 + (y + 3)2 = 25?
correct answer by circling or A. (10, 1)
writing the number or letter B. (17, 12)
of their choice C. (−8, −23)
D. (5, −6)
  True/false or Statements that the student Identify whether or not Tell whether or not each of
yes/no must decide are accurate or something is a prime the following numbers is a
not and write or circle true number. prime number by circling T if
or false or a similar indicator it is and F if not:
(e.g., yes/no, correct/incorrect, T F 1. 92
right/wrong, plus/minus) T F 2. 650
  Fill in the blank Statements that each have a Analyze a sales report The report reflects that the
(completion) word or phrase omitted that to determine important company’s best customer in
the student must insert items of information. the first half of the year
was _____.
  Short answer A set of questions, each of Identify the German verb Wie _____ es Ihnen? (gehen)
which the student answers form that is appropriate
with a word or brief phrase. for each sentence.
  Matching Two sets of related items; the Identify the area of the List of materials and list of
student connects them by library where a given library areas.
writing one beside the other item may be found.
or writing one’s letter beside
the other’s number.
Performance Essay (usually A statement or question Describe an instance of Give an example of an
Measures assessed by that requires a structured when the constructivist instructional objective
rubric; see but open-ended response; teaching technique for which a constructivist
description students write several would be an appropriate teaching technique would
below under paragraphs or pages. choice and describe the be appropriate, describe
Performance strategy that would be the technique, and give
Measures) appropriate for that three reasons it would be
situation. appropriate for the objective.
(Graded by an attached rubric.)
  Procedures A list of steps or activities Demonstrate the ______ 1. Turn on the
checklist students must complete procedure for using a camera.
successfully. digital camera to take a ______ 2. Adjust the
photo. settings, etc.
  Performance or A list of criteria that Develop a multimedia An example item for a
product rating students’ products or presentation that meets multimedia product:
scale performances must meet. all criteria for content, Scale
Each criterion may be instructional design, 3 = High, 2 = Acceptable
judged by a “yes/no” organization/ navigation, 1 = Unacceptable
standard or by a level of appearance, and _____ All content information
quality (e.g., 1, 2, or 3; low, graphics/sound. is current.
medium, high) _____ All information is
factually accurate,
etc.
(Continued )
82 Part I  •  Analysis

Table 4.3 (Continued)

Type of
Category Method Description Sample Action Sample Item
  Performance A set of elements that Develop a PowerPoint See examples at Kathy
or product describe a performance presentation to present Schrock’s Guide to Everything
rubric or product together with research findings. website: http://www
a scale (e.g., one to five .schrockguide.net/assessment-
points) based on levels of and-rubrics.html
quality for each element.
Attitude Likert scale A set of statements, and Demonstrate a I am likely to use the Hotline
Measures students must indicate a willingness to use the when I am faced with a
level of agreement for each company’s Information possible case of employee
set. Hotline to ascertain theft. Circle your choice:
company policy and SA A U D SD
procedure on important
personnel issues.
  Semantic Sets of bipolar adjectives, Demonstrate a positive When I think about working
differential each of which may describe attitude toward working with people from a culture
an item, person, or activity; with people of many other than my own, I feel:
each pair is separated by cultures. Good _ _ _ _ _ Bad
a set of lines or numbers; Happy _ _ _ _ _ Sad
students mark one to etc.
indicate a level of feeling on
the continuum from one to
the other.

TABLE 4.4  Summary Guidelines to Consider When Selecting


Assessment Formats
Type of Concern Questions for Designers to Ask
Directness • Is the method the most direct measure of learners’ performances?
• Does it satisfy you that they can do the desired performance in
real-world settings?
Validity • Is the method closely matched to the ability stated in the
objective?
• How easy is it to infer students’ true ability from the assessment?
• Are there enough items or performances required to establish true
competence in the skill?
Reliability • Will many different people be scoring the assessment?
• Can scoring procedures be simplified to reduce training
required?
Instrument • Is an existing test available to measure the behavior?
Development • Is there sufficient time to develop a better measure?
• Is there time to collect data to confirm reliability and validity?
Administration and • Is there time to train personnel who will score the tests?
Scoring Logistics • Will there be enough time to score assessments?
Chapter 4  •  Preparing Instructional Design Objectives and Assessment Strategies 83

Check Your Understanding 4.3

Objective 3 Exercise—Selecting Assessment Formats. Apply selection criteria to choose an


assessment format appropriate for each of the situations listed on the left. Types of formats
are listed on the right. Place the letter of the most appropriate format on the line next to each
of the descriptions on the left.
Objective Statements Assessment Formats
______ 1. A virtual school has an online course for teach- A. Multiple choice
ers in how to use software tools. The instructional B. True/false
goals call for teachers to create products (a word- C. Fill-in or short answer
processed document, spreadsheet, PowerPoint, D. Matching
etc.) that apply specified features of each tool. E. Essay (with rubric)
Each product will be graded according to how F. Procedures checklist
well it meets preset quality criteria. G. Performance or product
______ 2. A medical school wants to assess nurse practition­ rating scale
ers on their ability to calculate dosages of medi- H. Performance or product
cines for patients with various characteristics. rubric
______ 3. An instructional design consulting firm creates I. Likert scale
a unit on collaboration skills. One of its goals is J. Semantic differential
to make participants feel more positively about
working on projects in small groups.
______ 4. A community college has created a vocational
unit on how to do various electrical repairs, once
problems are already diagnosed. Students will
be graded on how well they follow steps in the
­correct order required to carry out repairs.
______ 5. The training unit of a pharmaceuticals company
has an online course for sales representatives
­designed to update them on products they sell
for various needs. They want to assess how well
­representatives can name a company product
for each of several needs a doctor might state,
promptly and without notes.

Click here for suggested answers

PREPARING INSTRUCTIONAL DESIGN OBJECTIVES

Sample Student Projects


Before writing instructional design objectives for your own instructional
product, see Sample Student Projects for four examples of how novice
designers accomplished these procedures for their own projects.

Procedures for Writing Instructional Design Objectives


Experienced designers tend to consider all components of an instructional design
objective at once and do a series of rapid writes and rewrites before settling on a final
statement that will be the basis for review by others and subsequent design work.
Many designers choose to have content experts and potential users of the instruction
review and give feedback on the objectives before proceeding with further design
work. For your work, your instructor serves this role. Novice designers should take
84 Part I  •  Analysis

the following step-by-step approach, breaking down each objective into distinct com-
ponents and writing each one before going back to refine each statement into a final
objective. This forces them to consider each component carefully, focusing on the es-
sential attributes of each one. However, if you are more comfortable working outside
a table, you may do that.
• Review the learning map.  In the Instructional Analysis step, you prepared a
learning map, analyzed learner needs, grouped the behaviors on the map into
learning segments each with a behavior to be measured, and decided on a se-
quence for teaching the segments. Now you should review the skills or steps that
lead up to learning and/or doing the behaviors. Some or all of these behaviors
will become an instructional design objective.
• List the target behaviors.  The first step in writing objectives for each segment
is either to create a table similar to the one in Table 4.2 and enter the target be-
haviors into the first column, or to simply make a list of the target behaviors.
• Decide on an action, assessment method, and performance level to dem-
onstrate the first behavior.  After deciding on the most direct way to assess
that the learner can do the behavior and carefully considering validity, reliability,
instrument preparation time, and administration and scoring logistics, decide on
assessment and performance level components for the first objective. Enter it
into a table or write it next to the behavior.
• Create the objective statement.  After completing the components of each ob-
jective, go back and review each objective and make any corrections necessary to
make it into a final statement. Finally, write completed statements of the objectives.
• Repeat the process for the other objectives.  As you write the statements, you
may realize that some behaviors can be combined into one objective. If neces-
sary, rewrite the statements to reflect the combined behaviors.

Check Your Understanding 4.4

Objective 4 Exercise—Steps in Preparing Instructional Design Objectives. From the fol-


lowing list, select only those that are steps needed to create objectives and put the steps in
correct sequence by placing the appropriate number to the left of the activity. (Reminder:
Some of the steps listed are not needed at all.)
______  Prepare more detailed learning maps for each objective.
______  Decide on an action, assessment method, and performance level for the first behavior.
______  Create the final objective statements for the first behavior.
______  Create new target behaviors, if needed.
______  Repeat the same process for all the objectives, combining behaviors, if necessary.
______  Revise each objective for face validity.
______  Review the learning map for target behaviors.
______  Make a list of the behaviors from the learning map.

Click here for suggested answers

Common Errors and Problems in Writing Objectives


Inexperienced designers tend to make certain common errors when writing
instructional design objectives. Look at the following problems to avoid in
each component. Each has an example that reflects the problem and a way to
correct it.
Chapter 4  •  Preparing Instructional Design Objectives and Assessment Strategies 85

• The action is too vague to be measured.  Although “develop” is an action


verb, the statement contains no activity the student will do to demonstrate a
greater “awareness.”
– Incorrect action.  Develop an awareness of the national debt.
– Correct action.  Describe causes of the national debt.
• The action focuses on the instructor rather than the student.  “Familiarize
students” places the focus on the instructor’s instruction, rather than the stu-
dents’ actions after instruction.
– Incorrect action.  Familiarize students with people’s actions that harm the
environment.
– Correct action.  Identify examples of people’s actions that harm the
environment.
• The action focuses on the students’ learning activities rather than postin-
struction activities.  “Use a CBL tutorial” places the focus on students’ learn-
ing activities, rather than the application students will make of what they have
learned after instruction.
– Incorrect action.  Use a CBL tutorial to learn how to apply actuarial
procedures.
– Correct action.  Do an actuarial analysis for a given situation.
• The action and/or assessment information are incomplete.  In this state-
ment, the focus is on the assessment, rather than the action being assessed. Also,
“World War II” does not provide specific enough information to clarify the area
of knowledge to be assessed.
– Incorrect objective.  Complete a 25-item fill-in-the-blank test on the events of
World War II. Students must get 24- of 25-items correct.
– Correct objective.  Identify important battles in the Pacific campaign of World
War II. Students complete a 25-item fill-in-the-blank test in which each statement
to be completed requires them to name a battle that fits the event described or
the role it played in the war. Students must get 24- of 25-items correct.
• The assessment does not match the required action.  The action calls for
students to demonstrate “proper procedures,” but the assessment calls for them
to match symptoms with diagnoses.
– Incorrect assessment for action.  Use proper procedures for determining
the appropriate diagnosis of elevated temperature in juveniles. Students com-
plete a matching test: the left-hand column has temperature symptoms of juve-
niles, and the right-hand column lists possible diagnoses.
– Correct assessment for action.  Use proper procedures for determining the
appropriate diagnosis of elevated temperature in juveniles. Students are given
five scenarios describing elevated temperature in juveniles and must write a
brief description of the procedures they would use to determine the correct
diagnosis. Descriptions are graded by a criterion checklist of required correct
procedures.
• The assessment does not specify how the action will be measured.  “Cor-
rectly” is not specific enough about what constitutes an adequate performance
with the spectrometer.
– Incorrect assessment.  Use correct procedures to use a spectrometer for ele-
ment identification. Students are given an element and must use the spectrom-
eter to obtain its spectrogram. All procedures must be done correctly.
– Correct assessment.  Apply correct procedures to use a spectrometer for ele-
ment identification. Students are given an element and must use the spectrom-
eter to obtain its spectrogram. The instructor uses a checklist of steps. Students
must complete each step on the checklist in correct order.
• The assessment does not require enough to confirm ability.  Students are
asked to label only two sentences. The objective should require them to do
86 Part I  •  Analysis

enough different examples to confirm they can identify any and all sentences as
fact or opinion.
– Incorrect assessment.  Select an example of fact and opinion. Give students
a newspaper story written at their grade level with all sentences numbered.
Under the paragraph, they must write the number of one sentence that is fact
and one that is opinion.
– Correct assessment.  Select an example of fact and opinion. Students are
given a newspaper story written at their grade level with all sentences num-
bered. Under the paragraph, they must write the numbers of five sentences
that are fact and five that are opinion. All 10 must be correctly labeled.
• The performance level criterion is not appropriate for the type of action
and/or the assessment.  The “accuracy” criterion relates to amounts and num-
bers (e.g., all items on a test are correct), but the action does not have items; it
must be assessed by requiring certain steps.
– Incorrect performance level.  Develop a plan for taking care of a given plant
in a way that will ensure it survives. The plan must be done with 100 percent
accuracy.
– Correct performance level.  Develop a plan for taking care of a given plant
in a way that will ensure it survives. The plan must reflect appropriate ways to
address each of the five care criteria.
• The performance level criterion is not realistic; it leaves no room for error. 
Because readings from a temperature probe are likely to fluctuate, demanding
exact readings is not realistic.
– Incorrect performance level.  Use a graphing calculator and temperature
probe to take readings of liquids. Readings of graph output must be exact.
– Correct performance level.  Use a graphing calculator and temperature
probe to take readings of liquids. Readings of the graph output must be cor-
rect within a range of ± .01.

Check Your Understanding 4.5

Objective 5 Exercise—Errors in Writing Instructional Design Objectives. Read each of


the following instructional design objectives, identify what is wrong with them, and rewrite
them correctly.
______ 1. Carry out an experiment on heat absorption with materials of various colors. Do
all steps in the experiment correctly. 
______ 2. Complete a worksheet of 25 items on the characteristics of planets in the solar
system. Twenty-four of the 25 items must be correct. 
______ 3. Do a t test with a given set of data. Students will be given 10 sets of data, and
must specify whether or not a t test can be calculated with the given data, then
perform the test, when possible. 
______ 4. Describe the appropriate sequence of procedures to use when taking a credit
card order over the phone. The list of steps will be given and students must put
them in correct order with 90 percent accuracy. 
______ 5. Learn how to download a plug-in from the Internet. Attend a demonstration on
how to download and use plug-ins; then download the plug-in required for a
given purpose. 

Click here for suggested answers


Chapter 4  •  Preparing Instructional Design Objectives and Assessment Strategies 87

Chapter 4 Summary

• Objectives can serve several kinds of useful in- establish validity and reliability; and logistics re-
structional roles (e.g., guides for reading, targets quired for instrument development, administra-
for students), but objectives for instructional de- tion, and scoring.
sign purposes are written to make sure required • Procedures for writing instructional design objec-
postinstruction performances align with assess- tives include: reviewing behaviors in the learn-
ments and instruction. Objectives also differ from ing map; listing the target behaviors; deciding on
content area standards; more than one objective an action, assessment method, and performance
may be needed to measure a standard. level to demonstrate the first behavior; creating
• Clarity and specificity are essential qualities for the objective statement; and repeating the process
instructional design objectives. All such objectives for each of the other behaviors.
must be in terms of what students will be able to • Common errors and problems in writing objec-
do and must specify the desired action the stu- tives include: the action is too vague to be mea-
dent will do post-instruction, as well as the assess- sured; action focuses on the instructor rather than
ment conditions and circumstances under which the student; the action focuses on the students’
they must do it and the performance criterion learning activities rather than postinstruction ac-
they must meet (e.g., number of items correct or tivities; the action and/or assessment information
level of accuracy). are incomplete; the assessment does not match
• Types of assessment methods include: mental the required action; the assessment does not spec-
skills and information tests (e.g., multiple choice, ify how the action will be measured; the assess-
true/false, fill-in-the-blank, matching, short an- ment does not require enough to confirm ability;
swer, essay), performance measures (graded the performance level criterion is not appropriate
by checklists and rubrics), and attitude surveys. for the type of action and/or the assessment; and
Guidelines for selecting the most appropriate for- the performance level criterion is not realistic be-
mat include: directness of measure as a reflection cause it leaves no room for error.
of real-world performance; resources required to

References

Farrell, E. (2002, July 12). Students won’t give up their French Popham, J. (2011). Classroom assessment: What teachers
fries. The Chronicle of Higher Education. Retrieved from need to know (6th ed.). Boston, MA: Allyn & Bacon.
http://chronicle.com/weekly/v48/i44/44a03501.htm Seels, B., & Glasgow, Z. (1998). Making instructional
Gagné, R., & Briggs, L. J. (1974). Principles of instructional ­design decisions (2nd ed.). Upper Saddle River, NJ:
design. New York, NY: Holt, Rinehart, & Winston. Merrill, Prentice Hall.
Gay, L. R., Mills, G. E., & Airasian, P. (2009). Educational Waugh, C., & Gronlund, N. (2013). Assessment of student
research: Competencies for analysis and application achievement (10th ed.). Upper Saddle River, NJ: Merrill,
(9th ed.). Upper Saddle River, NJ: Pearson Education, Prentice Hall.
Merrill/Prentice Hall. Willis, J. (1995). A recursive, reflexive instructional design
Mager, R. (1962). Preparing instructional objectives. model based on constructivist-interpretivist theory.
­Belmont, CA: Fearon. ­Educational Technology, 35(6), 5–23.
Oosterhof, A. (2009). Developing and using classroom as-
sessments (4th ed.). Upper Saddle River, NJ: Pearson
Education, Merrill.
88 Part I  •  Analysis

Chapter 4 Exercises

Click here to complete Exercise 4.1: New Terms and Concepts

Exercise 4.2: Questions for Thought and Discussion— Exercise 4.3: Design Project Activities and Assessment
These questions may be used for small group or class Criteria—As you prepare instructional design objectives
discussion or may be subjects for individual or group ac- for your product for this course, use the following criterion
tivities. Take part in these discussions in your in-person checklist to assess your work:
class meeting, or use your instructor-provided online dis- _____ 1. Instructional design objectives have been pre-
cussion area or blog. pared to cover all skills from the learning map
a. Willis (1995) says that “In the R2D2 (design) model, that will be included in the instruction.
specific objectives evolve naturally from the process _____ 2. For each objective, all three required compo-
of design and development . . . it is not important to nents are specified.
write specific objectives at the beginning of a (design) _____ 3. For each objective, the action is in terms of stu-
project.” Why does the approach that Willis recom- dent performance.
mends not work for systematic design models? Can _____ 4. For each objective, the assessment method will
you think of any design situations where the R2D2 be a valid, reliable, and practical way to confirm
model would be appropriate? that students have learned the action.
b. Popham (2011) notes that the standards currently be- _____ 5. For each objective, the performance level is a
ing offered by various content areas (e.g., science, reasonable requirement to demonstrate that stu-
mathematics, history) and by various state depart- dents have achieved the ability specified in the
ments can be very helpful to those selecting objec- objective.
tives to assess in schools. Give an example from
your chosen content area for how standards relate to
­instructional design objectives.
Chapter
5
Developing Assessment
Materials

One accurate measurement is worth


a thousand expert opinions.
—Grace Murray Hopper

Chapter 5 Topics
▪ The purpose and roles of assessments in instruction and instructional design
▪ Types of assessments and purposes of each type
▪ Essential characteristics and components of assessments
▪ How to create various kinds of assessment instruments
▪ Common errors and problems in designing assessments

Chapter 5 Learning Outcomes


1. Identify the overall purpose of assessment and three instructional roles assessments play to
fulfill this purpose.
2. Identify essential characteristics and components of each of the following types of assessment
formats: multiple choice, true/false or yes/no, matching, short answer/fill in the blank, essay,
checklist, rating scale, rubric, Likert scale, and semantic differential.
3. Select an available testing format or resource that could meet various assessment needs.
4. Apply design criteria and steps to create instruments that meet various assessment needs.
5. Identify and correct errors in assessment instruments or items.

89
90 Part II  •  Design, Development, and Implementation

SCENARIO
Matching up testing, assessment, and evaluation
Wiley, a seventh-grade science teacher, was at a neighborhood party talking with
his friend Matt, a local businessman, who was decrying “the sorry state of education
today.”
“I don’t envy your job, Wiley,” said Matt. “Did you see that newspaper article
yesterday about how many of our kids can’t meet the standards? What is causing all
these problems?” he asked, shaking his head. “These kids today are just hopeless.
I know you teachers work hard and do all you can, so I just don’t understand why our
kids can’t pass those tests. Our economy depends on having well-educated citizens
coming out of schools, and it looks like not many of them will be.”
Wiley cocked an eyebrow and said, “You know, I think your evaluation of the
situation is too pessimistic; things are not as bad as the stories would have you believe.
My kids do just fine on my tests, and I know many of them do great on the state’s
required science tests. I keep a pretty close eye on what happens to my students both
during my classes and afterwards, just so I can improve on something I teach if I need
to. I just feel that these high-stakes tests aren’t the whole answer, and they don’t
always tell what kids really know about a subject.”
Matt was incredulous. “What do you mean by that?” he asked. “Aren’t the tests
matched to state standards? Meeting standards is really important, isn’t it?”
“Yes, of course it is,” said Wiley, “But two things give me pause about those
tests. The first is that they only test kids in one way, usually a long multiple-choice
test. I use a lot of shorter assessments in different formats and at various times in a
course, depending on the kind of learning. Sometimes I have my kids do a lab, and
I use checklists and rubrics to assess how well they do. When I want to see if a kid is
following a particularly difficult concept, I might ask them to explain it to me verbally,
as if they were teaching me. They love that! It also helps me get them back on track if
it becomes clear they really didn’t get it the first time.”
“The other thing,” Wiley continued, “is that those tests put a lot of pressure on the
kids to tell everything they know all at one time.” He laughed, “Sometimes I think all
they’re testing that way is a kid’s stamina! When I give a major test, and it’s a bad day for
a kid for whatever reason—everyone can have a bad day—I give them another chance
to pass it. The state doesn’t have the resources to do any of the things I do. So I can’t
help but wonder if they’re measuring the kids’ knowledge of standards as well as I do.”
“I don’t know, Wiley,” said Matt skeptically. “I think you’re too soft on these kids.
It was a lot harder when I went to school. We kids either passed those tests or we
dropped out and went to work, you know?”
Wiley patted his friend’s shoulder reassuringly. “Yes, I know, Matt,” he said smiling.
“Fortunately, times have changed.”

BACKGROUND ON DEVELOPING ASSESSMENT MATERIALS


This chapter reviews four of the foundation concepts on which systematic instruc-
tional design is based. These all relate to the design of assessments, or tools used to
help provide a measure of what students have learned. These concepts require that
instruments do the following:
• Be designed so that students can demonstrate what they have learned.  As
stated in Chapter 3 on preparing instructional design objectives and assessment
strategies, the desired result of instruction is action as well as knowledge.
Chapter 5  •  Developing Assessment Materials 91

• Offer a valid and reliable way to measure how much learning has
occurred.  Instructors should be able to use the instruments with confidence
that they measure what they are intended to measure and that they do so consis-
tently across students and across time.
• Be closely matched to instructional design objectives.  The objectives offer
specifications for what should be measured and how students should be asked
to demonstrate they have learned.
• Be designed before instruction is designed and developed.  Finally, the
instruments should be created so that instruction is matched to them, and not the
other way around. This is designed to confirm that instruction is successful in
bringing about desired changes in behaviors. It provides the guidelines for judging
the success of both students and the quality of the instruction itself.

Listen to Learn How Assessments Matched to Objectives Improve Instruction

A Review of Assessment Purposes and Types


As the discussion between Wiley and Matt shows, there are many ways to determine
what students know at any given time, and not all assessments are used to assign a
grade or a pass/fail label. The terms assessment and measurement are linked, but
they are not necessarily synonymous. To complicate understanding the terms, not
all expert authors agree on just how they are linked. For example, Popham (2011),
a widely recognized assessment expert, says that assessment and measurement are
synonyms. Other authors such as Oosterhof (2009) and Gay, Mills, and Airasian
(2009) say they connote slightly different things. They say that measurement is as-
signing numbers for the purpose of “quantifying or scoring performance” (Gay et al.,
2009, p. 148), while an assessment is a tool used to help provide a measure of what
someone has learned.
All experts seem to agree, however, on the major purpose assessments serve.
Stiggins and Chappuis (2012) say that its purpose is to provide a “process of gathering
evidence of student learning to inform instructional decisions” (p. 5). All agree that as-
sessments should be linked to standards or instructional objectives and that instruction
should be created with assessment in mind. Finally, they agree that a test is one kind
of assessment instrument, but it is not the only kind. Different assessment instruments
and processes are needed for different instructional purposes.
Note that the opening scenario also shows an appropriate use of the term evalu-
ation. Wiley questions Matt’s overall appraisal of the education system, based on his
evidence and how he interprets it. An evaluation is a value judgment of the usefulness
or worth of assessments, instruction, or teaching quality. It is based on evidence from
assessments, but usually not from just one such assessment. Rather, it is an overall
judgment based on multiple data sources.
Experts also agree that accurate measurements are difficult to make and that a
measure may be affected by a number of different circumstances. Some of those con-
ditions (e.g., a student’s health or mental state) are not under the designer’s control.
Conditions that are under a designer’s control include how closely the assessment is
matched to the instructional purpose and how well it is designed to reflect essential
criteria. Those characteristics are the topics of this chapter. We begin with an overview
of the three purposes that assessments can serve.

ASSESSMENTS TO IDENTIFY PREREQUISITE ENTRY KNOWLEDGE AND SKILLS.  As you


learned in Chapter 3, many concepts that students learn build on other, previously
learned concepts. For example, a student who does not know how to add and multiply
92 Part II  •  Design, Development, and Implementation

numbers cannot learn long division. Therefore, whether or not students are able to
learn a given objective depends on whether they have learned the knowledge and
skills that are prerequisite to it. The first kind of assessment occurs before instruction
begins and helps teachers determine if students have entry behaviors or skills, which
they need to learn the new objectives but which will not be included in instruction.
The objectives on which this test is based come from the entry behaviors or skills
part of the learning map, which you learned about in Chapter 3. Depending on the
circumstances and how much time is available, this assessment can be formal or infor-
mal, written or verbal. But the implications are so important that the person doing the
assessment should make sure that results provide an accurate enough measure of
what the student knows.
The decision to be made as a result of this assessment is whether or not the stu-
dent knows enough to be able to learn successfully from the planned instruction. If
it becomes clear that one or more students lack some or all entry behaviors or skills,
the decision is likely to give remedial instruction until the students can show they
are ready to proceed to the next step. Note that the purpose of this prerequisite skills
test is different than a pretest. Rather than being outside the goals of the instruction,
pretests cover exactly the same content as the instruction. In most instructional situa-
tions, pretests are also used as diagnostic tests to determine how much of what is to be
taught the student already knows. Thus, they allow teachers and trainers to determine
if certain students do not need certain parts of the instruction and plan accordingly.
Though instructors are rarely tasked with demonstrating how much learning occurred
as a result of instruction, pretests and posttests serve this role in research studies and
in summative evaluations (see Chapter 10).

EMBEDDED ASSESSMENT ITEMS FOR PRACTICE AND DIAGNOSIS.  Practice is an essen-


tial part of most learning, because it helps learners consolidate their knowledge and
retain it better. It makes most sense to gear this practice toward performances called
for in each of the instructional objectives, which is why each objective should have
its own assessment. However, some of these assessments are used only as embed-
ded items, and will not be assessed again as separate skills. Others will be items
from the end-of-unit or end-of-course assessments, so students will see similar items
again on a final assessment. Either way, these embedded items serve an important
instructional purpose: to make sure students can demonstrate behaviors called for by
the objectives.
Another important purpose of embedded items is diagnosis, both self-diagnosis
by the students and diagnosis by teachers. By practicing the component knowledge or
skills, students get feedback that shows them if they really understood what they just
learned. Teachers and trainers get this feedback, too, and if results merit it, they may
decide to reteach the components, either to the whole group or to individuals.

ASSESSING LEARNED KNOWLEDGE AND SKILLS.  The last type of assessment is the one
most people think of when they think of assessment. It is administered at the end of
a unit or the end of a course. This activity usually results in a decision on what grade
will be awarded and/or whether or not students showed they mastered enough of the
content to receive credit for it. So-called “high-stakes tests,” mentioned in the opening
scenario, are examples of these assessments but are different from most other end-
of-instruction assessments in several ways. First, they are almost always timed and
their administration is standardized so that all students take them under similar
circumstances. This is intended to give all students the same opportunity to show what
they know and, therefore, be more “fair.” Second, they are almost always the type
of tests that can be scored quickly, usually by computer. Finally, the decisions they
Chapter 5  •  Developing Assessment Materials 93

enable affect students, teachers, and school systems, and data from them may drive
system-wide changes to instructional approaches. Thus, they have far more impact
than most assessments.

Check Your Understanding 5.1

Objective 1—Purpose and Instructional Roles of Assessment.

Of the following statements, which best describes the overall purpose that assessments
serve? Circle the letter of the correct answer.
A. Assign numbers in order to quantify or score student performance during instruction
B. Provide a process to gather evidence of student learning to inform instructional
decisions
C. Provide a value judgment of the usefulness or worth of instruction or teaching quality
Place a check by three of the following that are roles assessments should play in instructional
design:
______ 1. Give a valid measure of teaching quality
______ 2. Serve as ways to practice and diagnose
______ 3. Assess learned knowledge and skills
______ 4. Provide standardized measures of performance
______ 5. Identify prerequisite entry knowledge and skills

Click here for suggested answers

Essential Criteria for Effective Assessment Instruments


What determines whether an assessment really measures what it is intended to mea-
sure? First and foremost, assessments must be closely matched to objectives. Chapter 4
explained how the way objectives are phrased helps clarify the performance the stu-
dent must demonstrate, the format in which it will be demonstrated, and the criteria
students must meet to show mastery or adequate grasp of the skill. These are all essen-
tial characteristics of well-designed objectives, but they also lay an important founda-
tion for well-designed assessments. In addition to these features, all assessments must
meet other criteria, and some of these criteria are specific to the type of assessment.
This section will review these essential criteria and show how they make assessments
a more useful part of instruction for both educators and their students.

ESSENTIAL CHARACTERISTICS OF ALL INSTRUMENTS.  As Chapter 4 noted, all assess-


ments must be both valid and reliable if they are to fulfill their intended instruc-
tional purposes. Depending on the type of instrument or assessment method, there
are various ways to establish these qualities. All assessments should be tried out with
a sample of students before being put into actual instructional use, but that may not be
possible. In cases where a field test of assessments is not feasible, estimates of quality
must be based on evidence from other procedures such as expert review. Table 5.1
summarizes the types of assessment validity and reliability and how designers work
to establish them.
• Validity.  An assessment method is valid if it measures what it is supposed to
measure (Gay et al., 2009; Oosterhof, 2009). Testing experts discuss many types
of validity, but the most important ones for instructional designers are face valid-
ity, content validity, and sampling validity, which are arguably three terms for
the same thing. Face validity is a term used more by the general public than by
94 Part II  •  Design, Development, and Implementation

TABLE 5.1  Types of Validity and Reliability of Most Concern to Instructional


Designers and Methods to Establish Them
In Instructional Design,
Types of Validity Is Present When an Assessment: Establish by:
Face validity Appears to measure what it claims to Expert review
measure
Content validity Measures an intended content area Expert review
Sampling validity Represents knowledge or skills in the Expert review
entire content area
Types of Reliability    
Internal consistency Has consistently effective items within Statistical test (e.g.,
a test Cronbach alpha)
Test-retest Is stable across time; yields similar Statistical test (e. g.,
scores in two successive administrations Pearson r correlation)
with the same students
Inter-rater Shows that two or more persons Field test estimate or
scoring the same product get the same statistical test (e. g.,
score Pearson r correlation)

assessment experts. Waugh and Gronlund (2013) say that it is “the appearance
of being valid” (p. 42). Gay et al. (2009) say that face validity is “the degree to
which a test appears to measure what it claims to measure” (p. 154). They also
note it is sometimes used to mean content validity, which they define as “the
degree to which a test measures an intended content area” (p. 155). Though they
say that a check for face validity offers no sound way of determining an assess-
ment’s value, it is sometimes an initial screening step that should be followed by
more formal steps to validate content. Expert review may be seen as a kind of
face validity and content validity check. Gay et al. (2009) also note that sampling
validity is a kind of content validity that has to do with how well an assessment
represents knowledge or skills in the entire content area, rather than just a part
of it. To establish sampling validity, designers must try to make sure they have
enough items from various parts of the content to make a good measure of the
entire area.
Other characteristics of assessments that can affect validity have to do with
suitability for the students who will use them. For example, tests must be in the
language and at the reading level of students who will use them (unless their
purpose is to measure reading level in a given language), because they can-
not measure anything if students cannot understand what they ask. Also, the
language used in assessments must be compatible with students’ cultural back-
grounds. Famous examples of cultural incompatibility in assessments were items
from early forms of standardized achievement tests that referred to concepts with
which students were often not familiar, such as using “lawns” and “pineapples”
with urban students who had never heard of these things, let alone seen them.
• Reliability.  An assessment is reliable if it yields consistent results over time,
over items within the test, and over test scorers. Testing experts look at many dif-
ferent ways of establishing reliability, including internal consistency, and test-
retest reliability. Statistical tests can help establish these qualities. A Cronbach
alpha, Spearman-Brown, or Guttman’s split-half reliability test can help measure
Chapter 5  •  Developing Assessment Materials 95

internal consistency (Huck, 2012; Oosterhof, 2009), or the degree to which items
designed to measure the same thing within a test are able to produce similar
scores; and a Pearson r correlation checks for test-retest reliability, or when a
student gets similar scores in two successive administrations of the same test. But
sometimes, it is not practical to do them or instructional designers may not have
the expertise. Rather, they may rely on expert review to estimate these qualities.
However, with instruments such as rubrics, which are instruments designed to
measure complex behaviors such as writing by describing each of several levels
of behavior on its elements (see Essential Characteristics of Rubrics, later in this
chapter), and which require even more subjective judgments, designers may
want to establish inter-rater reliability (Gay et al., 2009), or the quality an instru-
ment exhibits when two or more persons scoring products with the same instru-
ment are likely to get the same score. In lieu of doing a statistical test such as a
Pearson r or a Kendall’s coefficient of concordance (Huck, 2012), designers may
decide to estimate reliability by having the designer and another person familiar
with the content and the products being assessed score the same student samples
and examine them to see if their scoring tends to result in a similar measurement.
If they do not, it may be that the instrument is communicating differently to dif-
ferent experts. Sometimes this can be corrected with clearer wording, but when
more than one rater is scoring products or performances with a rubric, training
raters to interpret the instrument in the way the designer intended is necessary
to ensure inter-rater reliability.
Each type of mental skills and information assessment also has its own es-
sential characteristics. Types of assessments and the essential criteria for each type
are reviewed next. Table 5.2 summarizes these types and gives the essential criteria
for each.

ESSENTIAL CHARACTERISTICS OF MULTIPLE-CHOICE TESTS.  Arguably the most com-


monly used type of assessment is the multiple-choice test. Popham (2011) says that
this type of item “has dominated achievement testing in the United States and many
other nations” (p. 148). Multiple-choice items have been used for standardized tests
and statewide end-of-course tests, as well as for everyday classroom assessment. Sev-
eral reasons account for this popularity. First, multiple-choice items can assess many
types of skills, from simple recall to problem solving. They can be also scored quickly,
and there are many statistical ways to establish essential qualities such as internal con-
sistency and test-retest reliability. Multiple-choice items each have two components:
the stem or question part of the item, and the answer options. In answer options, one
is correct, and other, incorrect, options are called distractors.

ESSENTIAL CHARACTERISTICS OF TRUE/FALSE OR YES/NO TESTS.  Sometimes called


binary-choice or alternate-choice tests, true/false and yes/no tests are a favorite of
many teachers because they can efficiently assess a great deal of content and they
can be scored quickly. Frisbie (1992) reviewed research on this format and found
that items are more reliable than other answer-choice formats, but they are somewhat
harder for students. However, they are often as difficult to create as multiple-choice
tests, and each item offers students a 50–50 chance at getting the correct answer; this
means that a binary-choice test must have more items than a multiple-choice test.
Some testing experts address the latter weakness either by asking students to rewrite
the false item to make it true or by identifying the false part of items (Oosterhof, 2009,
p. 113). However, this turns items into brief essay questions that are time consuming
to score. In light of its obvious limitations, instructional designers should choose this
type of instrument only when the content seems particularly well suited for it. For
96 Part II  •  Design, Development, and Implementation

TABLE 5.2  Types of Assessment Instruments and Essential Criteria for Each

Instrument Types Essential Criteria


Multiple choice • Both the stem and answer options are clearly worded, at students’ reading level, and contain no grammatical
clues to the correct answer.
• The stem is a complete sentence, question, or problem to be solved.
• The stem contains no information that is not pertinent to the problem.
• The stem is not negatively stated.
• The answer options are about the same length to avoid clues to the correct answer.
• The answer options have parallel wording to avoid clues to the correct answer.
• The answer options are logical alternatives that reflect errors students are likely to make.
• The answer options put the correct answer in random positions.
• The answer options do not include “all of the above.”
• The answer options contain one choice that is clearly correct.
True/false or yes/no • The items are as brief as possible and clearly stated.
• The items do not use absolute and indefinite modifiers such as “no,” “never,” “always,” “all,” “sometimes,”
or “often,” that make it very likely the choice is false, because it is unlikely there would be a circumstance in
which they would be true.
• The items use no negative words or double negatives that make the question difficult to understand.
• The items contain no unintended clues to the correct answer.
• The instrument contains as many true items as false ones and intersperses them randomly.
Matching • The longer list is on left.
• The directions for how to match are clearly worded.
• The directions avoid drawing lines.
Fill in the blank or • The items contain only one or two missing words.
short answer • The missing word in each item is an important part of the concept being tested.
• The items are phrased so that each missing word is only one possible answer, with very limited wording choices.
• The items contain no grammatical clues to the answer (e.g., “a” or “an” just before the blank).
• The length of the blank is the same in all items to give no clue to possible answers.
• Example: In music compositions, a _____ is a fraction found at the beginning of a piece of music, after the clef
and key signature. (Answer: time signature)
Essay • The essay format has been selected only to measure a complex skill (e.g., a creative written work or an analysis
of a position or situation).
• The topic and answer requirements are stated clearly.
• The students know the grading criteria before submitting their work.
• It is usually graded by rubric and/or checklist.
Checklist or rating • All required steps are listed in the correct order.
scale • Students know the grading criteria before submitting their work.
• Example: Editing Checklist
_____ All paragraphs have topic sentences.
_____ Every sentence ends with a punctuation mark.
_____ Each sentence begins with a capital letter, etc.
Rubric • The elements or dimensions selected to define the behavior are comprehensive in describing the product or
performance.
• The elements or dimensions selected to define the behavior are mutually exclusive.
• The descriptions that define what the product looks like at each level are clear.
• The points to assign grades are included.
• The students know the grading criteria before submitting their work.
Likert scale • The items are as brief as possible and clearly stated.
• The directions are clear in describing how to self-assess one’s beliefs in order to complete the items.
• The scale is appropriate as a response for each item.
• Example: I feel the current payroll management system works well.
Strongly Agree  Agree  Unsure  Disagree  Strongly Disagree
Semantic differential • The sets of bipolar adjectives make sense in describing the concept.
• The directions are clear in describing how to self-assess one’s beliefs in order to complete the items.
• Example: How do you feel about mathematics?
Warm _____ _____ _____ _____ _____ Cold
Happy _____ _____ _____ _____ _____ Sad
etc.
Chapter 5  •  Developing Assessment Materials 97

example, students are to identify examples and nonexamples of something (e.g., stable
versus unstable chemical compounds). One format that can include many different
items in one format is giving a paragraph with a number of underlined words, and
asking the student if each is or is not correct. For example, the student is asked if the
underlined words are or are not examples of a verb.

ESSENTIAL CHARACTERISTICS OF MATCHING TESTS.  Like multiple-choice and true/false


instruments, this type of assessment is a choice format. It contains two lists of words,
phrases, or images and asks students to select an item in one list that matches a choice
in the other. This format is especially useful for assessing classification kinds of skills.
For example, one list could be names of animal phyla, and the other could give names
or images of animals, asking students to match each name or picture with its correct
phylum. The two lists may either contain the same number of items or different num-
bers (for example, only five phylum names but ten examples to match with them), and
it is not necessary to use all of either list.

ESSENTIAL CHARACTERISTICS OF SHORT ANSWER AND FILL-IN-THE-BLANK TESTS.  Some


testing experts believe that, while they take longer to score, instruments that require
students to construct an answer are more valid than those in which they choose from
possible answers. The fill-in format eliminates guessing and makes the student gen-
erate a response. Because designers do not have to supply answer options, they are
also easier for designers to construct. Short-answer items are statements that require
a one-word or short phrase response, while fill-in items are sentences that each con-
tain a blank where students must supply a missing word. Designers usually select this
kind of item for simple recall of information, although it also may be appropriate for
assessing simple concepts.

ESSENTIAL CHARACTERISTICS OF ESSAY TESTS.  If multiple-choice tests are among the


most popular assessment formats, essay tests are among the most unpopular. Though
a good format for assessing writing quality and for skills such as being able to clearly
define and defend a position on a given issue, essay items are difficult and time-
consuming to grade. Many testing experts say there are two kinds of essay items:
brief response (a.k.a., restricted response) and extended response. Brief- or restricted-
response items set clear limits on the response (e.g., list three social and economic
conditions that led to World War I and describe each in one paragraph). When they
are assigned, designers must also provide criteria for grading them or an instrument
such as a rubric (see discussion below).

ESSENTIAL CHARACTERISTICS FOR CHECKLISTS AND RATING SCALES.  These instruments


are lists of component steps or required tasks in a complex performance such as
completing a science lab or building a model. When it includes just the list of tasks
with one point for each task accomplished in the correct order, the instrument is a
checklist or performance checklist. When a range of points is possible for each of
several required tasks, it is a rating scale. When the designer wants varying points
assigned for each step, depending on how well the student accomplishes them, a
rubric (described below) is preferable to a rating scale, because rubrics make the quality
required for each point level clearer to students.

ESSENTIAL CHARACTERISTICS OF RUBRICS.  Brookhart (2013) says that rubrics (scoring


rubrics) are instruments consisting of “a coherent set of criteria for students’ work
that includes descriptions of levels of performance quality on the criteria” (p. 1). They
are often used to measure quality of complex behaviors such as writing or product
98 Part II  •  Design, Development, and Implementation

Rubrics are useful


assessment instruments
to help measure
complex skills such
this one, using
computer-aided design
(CAD) to do drawings.
Credit: Courtesy Bill
Wiencke

development. They provide a basis for grading by supplying a matrix that describes
what the product looks like at various levels of quality (e.g., poor, acceptable, good,
excellent) on each of several elements or dimensions (e.g., clarity, grammar, mechan-
ics, and organization, and vocabulary). Rubrics came into use in the 1970s for grad-
ing writing samples, but became increasingly popular when alternative assessments
to traditional testing came into use. Creating an effective rubric requires the designer
to identify specific qualities that make the performance acceptable and describe how
these qualities look at various levels of performance.

ESSENTIAL CHARACTERISTICS OF LIKERT SCALES.  Instruments such as the Likert scale


have been specifically designed to measure attitudes (Likert, 1932). These instruments
contain sets of items, each asking the persons completing them to select how they feel
about something on a scale from least to most (e.g., agreement from strongly agree
to strongly disagree or frequency of occurrence from always to never). They measure
attitudes by asking students to reflect on how they feel toward a topic or how they
might behave in response to a situation.

ESSENTIAL CHARACTERISTICS OF SEMANTIC DIFFERENTIALS.  A semantic differential


is an instrument that asks students to express how they feel about a topic (e.g., learn-
ing social studies or people of other cultures) by selecting a position on a continuum
between two bipolar adjectives (e.g., warm and cold). The continuum is shown as
a set of lines between the two adjectives. It is also used to measure attitudes and is
especially useful for younger children, because it minimizes reading.
Chapter 5  •  Developing Assessment Materials 99

Check Your Understanding 5.2

Objective 2—Characteristics and Components of Assessments. Select a word or phrase


from the list on the right to complete each of the following statements about assessment
characteristics and instrument types. Place the letter of each answer in the appropriate space.
Assessment
Statements About Assessments Characteristics/Types
1. While both _____ and _____ are assessment instruments A. Validity
that can list component steps or required tasks in a com- B. Reliability
plex performance such as completing a science lab, only C. Multiple choice
one of these, the _____ offers a range of points assigned D. True/false or yes/no
to each required task. E. Matching
2. Items in _____ instruments ask the completers to indicate F. Fill in the blank
how they feel about a statement by selecting a level on a G. Short answer
scale from least to most (e.g., strongly agree to strongly H. Essay
disagree). I. Checklist
3. Face _____, content _____, and sampling _____ are three J. Rating scale
types of _____. K. Rubric
4. _____ items must have a stem that is clearly worded, a L. Likert scale
clearly correct answer, and several other answer options M. Semantic differential
that reflect errors students are likely to make.
5. _____ instruments must have sets of bipolar adjectives that
make sense in describing the concept and clear directions
on how to complete it in a way that self-assesses one’s
beliefs.
6. Test-retest, internal consistency, and inter-rater are three
types of _____.
7. _____ instruments contain two lists of words, phrases, or
images and must have directions on how to match them
that avoids drawing lines.
8. The _____ instrument has elements that describe compo-
nents of a complex behavior or product in various dimen-
sions or degrees of quality.
9. The _____ instrument can be used to ask students to iden-
tify examples and nonexamples of something or can give
students a paragraph with a number of underlined words,
and ask the student if each is or is not an example of a
concept.
10. While both the _____ and the _____ require a one-word or
phrase answer, only the _____ requires an answer after a
question.
11. The _____ assignment can measure the student’s grasp
of complex ideas or skills using either brief (restricted)
response or extended-response format; however, either is
time-consuming to grade.

Click here for suggested answers


100 Part II  •  Design, Development, and Implementation

CREATING TESTS, PERFORMANCE MEASURES,


AND ATTITUDE INSTRUMENTS

Sample Student Projects


Before proceeding to create assessments for your own instructional
product, see Sample Student Projects for four examples of how novice
designers accomplished these procedures for their own projects.

Testing Formats and Tools


Technological developments in testing software and online environments have
increased the number of options for creating and implementing assessments. The main
benefit of using these technology-enhanced formats in lieu of paper-pencil formats is
efficiency. The designer can generate most instruments quickly and the instructor does
not have to grade most of the tests, so it is easier to assess more frequently. Short
answers and essay tests are the exception, because they require instructor review to
determine correctness. But this may be changing as automated essay scoring, or
using programs that employ language-processing capabilities to assign grades to
student essays, becomes increasingly feasible (Bridgeman, Trapani, & Attali, 2012).
In addition, many technology-enhanced assessment tools summarize data across
students to facilitate the instructor’s decision making. However, whether or not a
designer can specify one of these technology-enhanced formats for classroom use
depends largely on whether instructors will have access to the tools described here.
This section describes features of six assessment formats and tools that have become
popular in recent years.

COMPUTER-BASED TEST GENERATORS.  Test generators are a category of software tool


that allows creation of various kinds of test forms or versions based on items typed
in previously by an instructor. Test generators offer many benefits over other, more
general-purpose tools such as word-processing software. These include easy revision
procedures, random generation of items, selection of items based on criteria, and
printable answer keys (Roblyer & Doering, 2013, p. 148).

RUBRIC GENERATORS.  Several Internet sites offer free rubric generators. The designer
follows a set of prompts, and then the system creates a rubric that can be printed
out; some may be referenced as an online location (Roblyer & Doering, 2013, p. 150).
Popular rubric generation sites are RubiStar and Rubric Maker.

COMPUTER-BASED TESTING SOFTWARE.  Another group of tools allow students to


take tests right on the computer screen; others allow either on-screen testing and/or
allow the same benefits described earlier for computer-based test generators. Some on-
screen testing programs make possible computer adaptive testing (CAT), in which
the computer continually analyzes a student’s performance during a test and presents
more or less difficult items depending on the student’s answers. Another category
of software tools are grading programs in which students take tests by darkening in
circles beside answers on a scannable sheet of paper (sometimes called a “bubble
sheet”) that is graded by inserting it into a scanning machine.

STUDENT RESPONSE SYSTEMS (CLICKERS).  Also known as personal response systems,


classroom response systems, or clickers, student response systems (SRS) also comprise
Chapter 5  •  Developing Assessment Materials 101

a range of tools that usually include a set of handheld devices and software that per-
mits a group of students to answer the same question simultaneously, analyzes the
responses, and displays them in summary form for the teacher (Roblyer & Doering,
2013). These tools are especially helpful for displaying embedded practice items and
are often used in conjunction with interactive whiteboards (IWB), or devices that
include a display screen connected to a computer and digital projector. Now that many
students have smartphones that can be used in place of clicker systems to interact with
IWBs, it has become even easier for teachers to insert embedded items for practice or
to diagnose problems.

ONLINE SURVEY SITES.  These online tools originally came into common use to gather
survey data, primarily for research purposes, but they may also be used to offer vari-
ous kinds of assessments, including attitude measures and achievement measures.
Online survey tools allow designers to create and implement their own attitude sur-
veys and questionnaires, but the sites may also be used for giving tests online. The
sites provide features that make it quick and easy to design many kinds of items; the
most commonly used formats are multiple choice and Likert scale. After creating an
online instrument, the designer can e-mail an invitation that contains a link to the site,
and people in any location that has an Internet connection can fill in answers to the
items. The survey site automatically collects and organizes data, and the instrument
designer can request the system to display data in charts and graphs. For example, if a
designer wanted to see how people were responding on a given item, the online site
can display that information in a bar chart labeled with percentages of respondents
selecting each answer to each question.
Most online survey tools allow free use for a limited time, or for shorter in-
struments, but if designers want to be able to download a data file of responses,
they must usually pay a usage fee. Commonly used sites include SurveyMonkey and
Zoomerang.

ASSESSMENT FEATURES OF COntent MANAGEMENT SYSTEMS (CMS).  If the course is


being designed for online or blended environments, the CMS that houses the course
provides an online assessment component. These systems allow course designers to
create tests or surveys that students can take at the site. Data are automatically col-
lected, and the instructor and/or designer can obtain data in summary form. Popu-
lar CMSs available for a fee are eCollege, Blackboard, and Desire2Learn (D2L), but
there are a number of open-source systems available for free, including Moodle and
Drupal.

SOURCES OF EXISTING INSTRUMENTS.  The next step in assessment design is to create


the instruments. However, creating valid and reliable tests is time consuming. De-
signers should begin this task by reviewing sources of existing tests on the topic and
(if copyright permits), selecting either an entire existing test or parts of it in order
to save design time. More traditional, validated assessments may be found through
library searches of databases such as the Mental Measurements Yearbook and Tests in
Print. For checklists and rubrics, check out some of the following websites:
• Internet4Classrooms:  This site contains a collection of K–12 resources that
include sample tests for most content areas and grade levels and rubrics for many
products and processes.
• Kathy Schrock’s Guide to Everything:  This site contains one of the largest
available collections of rubrics and links to rubric generators.
102 Part II  •  Design, Development, and Implementation

Check Your Understanding 5.3

Objective 3—Assessment Formats and Tools. Select a technology-enhanced assessment tool to meet each of several
assessment needs. Place the letter(s) of the assessment tool (listed on the right) on the line next to the descriptions of
assessment needs on the left.
Technology-Enhanced
Assessment Needs Assessment Tools
______ 1. For a training workshop in statistical analysis, a designer A. Computer-based test generators
wants to gather pretest data from the workshop participants B. Rubric generators
to determine what they already know. The designer wants to C. Computer-based testing software
have participants take a pretest from their home locations D. Student response systems (clickers)
before training begins, then display a chart of the results E. Online survey tools
on the first day of the workshop. F. Assessment features of content
______ 2. A designer is updating an in-person math workshop to help management ­systems (CMS)
young adults prepare for an exam to get the equivalent of a
high school diploma. She wants to intersperse embedded
items throughout one unit to make sure participants are
grasping key concepts. She wants to gather responses quickly
and be able to see data summaries instantly, so the instructor
will be able to provide additional help, when needed.
______ 3. As a designer creates a remedial mathematics unit for
community college students, he would like to include a
test-item bank the instructor can use to create several
different versions of the same test. This will allow students
to retake tests, when needed, without taking the same test
each time.

Click here for suggested answers

Designing Assessment Instruments


This section describes step-by-step procedures for designing several of the most popular
kinds of assessment instruments.

PROCEDURES FOR DESIGNING MENTAL SKILL AND INFORMATION TESTS.  Steps for
designing multiple-choice, true/false or yes/no, matching, or short answer/fill in the
blank are generally the same. These steps include:
1. Review the instructional design objective and review all the content to be
sampled.
2. Review the decision that was made when instructional design objectives were
written about the number of items needed for each area of content. This deci-
sion has implications for sampling validity so, if necessary, adjust the number to
reflect better the entire area.
3. Draft the items. In the case of multiple-choice items, determine the most com-
mon errors students are likely to make in order to write good distractors. Make
sure items meet criteria specified earlier for the type of items.
4. Decide on the testing format (see the section in this chapter on Testing Formats
and Tools). Put the instrument in the desired format.
5. If possible, ask content area experts to review the instrument for: how well the
items reflect knowledge and skills, as specified by the objective; clarity of word-
ing; and accuracy of answers that the designer has identified as correct.
Chapter 5  •  Developing Assessment Materials 103

6. If possible, field test the instrument with sample students and revise as needed
before implementing it with larger groups.

PROCEDURES FOR DESIGNING PERFORMANCE CHECKLISTS AND RUBRICS.  If the instruc-


tional design objective calls for a complex performance such as writing an essay or
creating a research report, create a checklist or rubric to assess the product. For check-
lists, design steps include:
1. Review the steps required to perform the skill. These should have been identified
in the instructional analysis step described in Chapter 3.
2. Create the checklist, assign points, and design the grading scale.
3. Decide on the testing format (see the section in this chapter on Testing Formats
and Tools). Put the instrument in the desired format.
4. If possible, ask content area experts to review the checklist to make sure steps
are comprehensive and in the correct order.
5. If possible, field test the checklist with sample students and revise as needed
before implementing it with larger groups.
For rubrics, design steps include:
1. Review the skill and list elements (sometimes called dimensions) that must be
present in a high-quality or expert performance. For example, the ReadWrite-
Think Writing Rubric lists dimensions as: content/ideas, organization, vocabu-
lary/word choice, voice, sentence fluency, and conventions. It is important that
the list be comprehensive and the elements be mutually exclusive.
2. Decide what levels of performance are needed and how they should be labeled
(e.g., poor, good, superior; does not meet expectations, partially meets expecta-
tions, meets most expectations, meets all expectations, exceeds expectations).
The more levels of performance are specified, the more difficult it is to define
performance in ways that do not overlap each other. A greater number of levels
also make use of the rubric more time consuming.
3. Write the description for each cell in the rubric. Be sure each description focuses
only on the dimension it is addressing and differs in a measurable way from the
descriptions of other levels of quality.
4. Assign points and a grading scale.
5. Decide on the testing format (see the section in this chapter on Testing Formats
and Tools). Put the instrument in the desired format.
6. If possible, ask one or more content area experts to review the rubric to make
sure steps are comprehensive and in the correct order.
7. If possible, field test the rubric with sample students, check for inter-rater
reliability, and revise as needed before implementing it with larger groups.

PROCEDURES FOR DESIGNING ATTITUDE MEASURES.  If the instructional design objec-


tive calls for a change in attitude to occur as a result of instruction, create a Likert
scale instrument or a semantic differential to assess this change. For Likert scale instru-
ments, design steps include:
1. List behaviors and/or beliefs that would be characteristic of a person that did or
did not have the attitude being assessed. For example, if the designer is trying
to assess changes in attitudes toward plagiarism, the list might include: inten-
tion to look for products on the Internet to fulfill an assignment, and a belief
that this behavior is okay under certain conditions such as not having enough
time to write the assignment oneself.
2. Write statements that reflect each behavior or characteristic.
104 Part II  •  Design, Development, and Implementation

3. Select an appropriate scale for the instrument such as strongly agree, agree,
uncertain, disagree, and strongly disagree.
4. Write clear directions for completing the instrument.
5. Decide on the testing format (see the section in this chapter on Testing Formats
and Tools). Put the instrument in the desired format.
6. If possible, ask one or more content area experts to review the scale to make
sure the items are clear and tied closely to expected behaviors and beliefs.
7. If possible, field test the scale with sample students, check for internal consis-
tency, and revise as needed before implementing it with larger groups.
For semantic differential instruments, design steps include:
1. Decide on the stimulus word, phrase, or sentence. For example, if the designer is
trying to assess changes in attitudes toward mathematics, just the phrase “Doing
math” would suffice. If assessment addresses changes in attitudes toward people
of another culture, say “How do you feel when you think of people who have a
different skin color than yours?”
2. Decide on adjective pairs (e.g., good . . . bad; happy . . . sad) and put the instru-
ment in the format with three to seven lines in between the words.
3. Decide on the assessment delivery format.

Check Your Understanding 5.4

Objective 4—Designing Assessment Instruments. For one or more of the following types
of assessment, design an example product that meets all required criteria and is appropriate
for the type of skill you want to assess. Be sure to include appropriate directions for using the
instrument. If you and your instructor agree, you can substitute similar or other instruments
for the ones listed below.
1. A ten-item true/false or yes/no test that identifies examples versus nonexamples of an
appropriate content item from your area of expertise (e.g., nouns versus not nouns,
mammals versus not mammals, scenario describing behaviors that are either correct or
not correct)
2. A performance checklist or rating scale for how to balance a checking account
3. A rubric for assessing a web page developed for a technology education class
4. A ten-item Likert scale for assessing attitudes toward gang membership
5. A ten-item multiple-choice test that assesses an appropriate skill from your area of
expertise

Click here for suggested answers

Common Errors and Problems with Assessment Instruments


Inexperienced designers who are not assessment experts tend to make
certain common errors when writing assessments. Look at the following
problems to avoid when designing assessments. Each has an example that
reflects the problem and a way to correct it.

• In a multiple-choice item, a stem is an incomplete statement or problem.  In


the incorrect first item, the stem or stimulus part of the item is incomplete.
Chapter 5  •  Developing Assessment Materials 105

– Incorrect.  In the example below, the number 5 is


– Correct.  In the division problem above, what is the correct name for the
number 5?

6
5 31
  30
r 1
A. divisor
B. dividend
C. quotient
D. remainder
• In a multiple-choice item, a stem gives an unintended clue to the correct
answer.  The word “heroine” gives a hint that the gender of the character is
female.
– Incorrect.  Which of the following is Charlotte Bronte’s most famous
heroine?
– Correct.  Which of the following is Charlotte Bronte’s most famous character?
A. Jane Eyre
B. Edward Rochester
C. Linton Heathcliff
D. William Crimsworth
• In a multiple-choice item, options give unintended clues as to the correct
answer.  The correct option (B) is much longer and more detailed than incor-
rect options. Students may be drawn to it because it stands out.
– Incorrect.  Which of these statements describes how “falsifiability” applies to
a hypothesis?
A. The hypothesis is obviously false.
B. If a hypothesis is false, then it is possible to demonstrate its falsehood.
C. The hypothesis may be false.
D. The hypothesis sounds false.
– Correct.  Which of these statements describes how “falsifiability” applies to a
hypothesis?
A. The hypothesis has been previously demonstrated to be false.
B. If a hypothesis is false, then it can be demonstrated to be false.
C. It is possible that the hypothesis is false under certain conditions.
D. The hypothesis sounds false; therefore, it is likely to be false.
• In a multiple-choice item, distractors are not ones students are likely to
select.  The distractors in the following list are diseases that many people know
from common knowledge (and/or common sense) are not airborne.
– Incorrect.  Which of these is an example of an airborne disease?
A. Childhood diabetes
B. Chicken pox
C. High blood pressure
D. Athletes foot
– Correct.  Which of these is an example of an airborne disease?
A. Typhoid
B. Chicken pox
C. Dengue fever
D. Malaria
106 Part II  •  Design, Development, and Implementation

• In a true/false or yes/no item, the item uses an absolute term that automati-
cally makes the item likely to be false.  In the incorrect example, the word
“all” signals that the item is most likely false.
– Incorrect.  During World War II, all French citizens initially supported the
Vichy regime.
– Correct.  During World War II, the Vichy regime initially had popular support.
• In a matching item, the directions for how to complete it are not clear.  The
directions in the incorrect item are vague as to which list contains the parts of
speech and which contains the examples; also, it is not clear how to show the
matches.
– Incorrect.  Directions: Match the tools with their uses.
– Correct.  Directions: Match each of the ten tools on the left with the function
it fulfills in electrical work, listed on the right. Write the letter of the tool func-
tion on the line next to the tool.
• In a fill-in-the-blank item, the missing word can be more than one
choice.  In the incorrect item, the correct answer could be many different things
ranging from “state” to “popular vacation site.”
– Incorrect.  Florida is an example of a/an _____.
– Correct.  Florida is an example of a land mass called a/an _____.
• In an essay, the scope of the task and grading criteria are unclear.  In the
incorrect example, directions do not specify how much the student should write
or limit what they should write about; also, it is not clear how the work will be
graded.
– Incorrect.  Discuss the impact of Manifest Destiny. Do your best work.
– Correct.  Explain the concept of Manifest Destiny and describe two expan-
sionist initiatives it was used to justify. Write a paragraph on the definition and
one on each initiatives (5 points for each paragraph = 15 points total).
• In a Likert scale, the scale does not provide a direct response to the item. 
In the incorrect item below, the scale does not match the question.
– Incorrect.  How likely are you to smoke cigarettes in the future?
Strongly Agree, Agree, Uncertain, Disagree, Strongly Disagree
– Correct.  I am likely to smoke cigarettes in the future.
Strongly Agree, Agree, Uncertain, Disagree, Strongly Disagree
• In a rubric, performances described in cells are not sufficiently distinct
from each other or are trying to describe more than one element at a
time.   

Check Your Understanding 5.5

Objective 5—Identifying and Correcting Errors in Assessment Instruments. Indicate


whether or not each of the following items or directions is stated appropriately to meet all
assessment criteria. If they are not stated appropriately, identify what is wrong with them
and write them correctly.
______ 1. Which of the following planets is closest to Earth?
a. Mars
b. Sun
c. Moon
d. Stars
______ 2. In a rubric to assess a sales presentation, the following is the complete list of
dimensions to be measured: visual aids, background knowledge of products.

Chapter 5  •  Developing Assessment Materials 107

______ 3. Essay assignment: Compare the positions of the two U.S. parties during the 1972
presidential election. Grades are based on clarity and detail.

______ 4. True/false item: Social networks are always dangerous places for middle-school
kids.

______ 5. How likely are you to use company-accepted safety procedures you have learned
for situations when you are transferring stock on shelves in the store?
Strongly Agree, Agree, Unsure, Disagree, Strongly Disagree

Click here for suggested answers

Chapter 5 Summary

• Assessments may serve many different kinds of reviewing the skill and list elements or dimen-
purposes in instruction including: identifying sion that must be present; deciding what levels of
prerequisite entry knowledge and skills, practic- performance are needed and how they should be
ing and diagnosing performances, and assessing labeled; writing the description for each cell in the
learned knowledge and skills. rubric; assigning points and a grading scale; and
• Essential criteria for effective assessment instru- deciding on the testing format. For Likert scale in-
ments include both validity (i.e., face validity, con- struments, design steps include: listing behaviors
tent validity, and/or sampling validity) and reliability and/or beliefs to include; writing statements that
(internal consistency test-retest, and/or inter-rater). reflect each behavior or characteristic; selecting
However, each of the following kinds of assess- an appropriate scale; writing clear directions; and
ment instruments must also meet its own essential deciding on the testing format. For semantic dif-
criteria: multiple-choice tests, true/false or yes/no ferentials, design steps include: deciding on the
tests, matching tests, short answer and fill-in-the- stimulus word, phrase, or sentence; and deciding
blank tests, essay tests, checklists and rating scales, on adjective pairs. For all instruments, final proce-
rubrics, Likert scales, and semantic differentials. dures should include, if possible, asking content
• Several kinds of computer-based assessments area experts to review the instrument and field
tools are available to make development and use testing it with sample students.
of assessments more efficient. These include: • Common errors and problems in creating assess-
computer-based test generators, rubric genera- ment instruments include: in a multiple-choice
tors, computer-based testing software, student item, a stem is an incomplete statement or prob-
response systems (clickers), online survey sites, lem or gives an unintended clue to the correct
and assessment features of online content manage- answer; options give unintended clues as to the
ment systems (CMS). Also, designers may choose correct answer; and distractors are not ones
to use online sources of existing instruments. students are likely to select. In a true/false or
• Procedures for designing mental skill and infor- yes/no item, the item uses an absolute term that
mation tests include: reviewing the instructional automatically makes the item likely to be false. In
design objective and all the content to be sampled, a matching item, the directions for how to com-
reviewing previous decisions about the number of plete it are not clear. In a fill-in-the-blank item,
items needed for each area; drafting the items; the missing word can be more than one choice.
and deciding on the testing format. Procedures In an essay, the scope of the task and grading cri-
for designing performance checklists and rubrics teria are unclear. In a Likert scale, the scale does
include: reviewing the steps required to perform not provide a direct response to the item. In a
the skill; creating the checklist, assigning points rubric, performances described in cells are not
and designing the grading scale; and deciding on sufficiently distinct from each other or are trying
the testing format. For rubrics, procedures include: to describe more than one element at a time.
108 Part II  •  Design, Development, and Implementation

References

Bridgeman, B., Trapani, C., & Attali, Y. (2012). Comparison Popham, J. (2011). Classroom assessment: What teachers
of human and machine scoring of essays: Differences need to know (6th ed.). Boston, MA: Pearson, Allyn &
by gender, ethnicity, and country. Applied Measure- Bacon.
ment in Education, 25(1), 27–40. Roblyer, M. D., & Doering, A. (2013). Integrating educa-
Brookhart, S. (2013). How to create and use rubrics. Alex- tional technology into teaching (6th ed.). Boston, MA:
andria, VA: Association for Supervision & Curriculum Pearson, Allyn & Bacon.
Development (ASCD). Stiggins, R., & Chappuis, J. (2012). An introduction to
Frisbie, D. (1992). The multiple true-false item format: A student-involved assessment for learning (6th ed.).
status review. Educational Measurement: Issues and Upper Saddle River, NJ: Pearson Education, Merrill/
Practice, 11(4), 21–26. Prentice Hall.
Gay, L. R., Mills, G. E., & Airasian, P. (2009). Educational Waugh, K., & Gronlund, N. (2013). Assessment of stu-
research: Competencies for analysis and application dent achievement (10th ed.). Columbus, OH: Pearson
(9th ed.). Upper Saddle River, NJ: Pearson Education, Education.
Merrill/Prentice Hall. Wiggins, G. (1993). Assessment: Authenticity, context and
Huck, S. (2012). Reading statistics and research (6th ed.). validity. Phi Delta Kappan, 75(3), 200–214.
Boston, MA: Pearson, Allyn & Bacon. Young, J. R., (2011, August 12). Professors cede grading
Likert, R. (1932). A technique for the measurement of at- power to outsiders—even computers. The Chronicle of
titudes. Archives of Psychology, 2(140), 44–53, 55. Higher Education, 57, A1, 4–5.
Oosterhof, A. (2009). Developing and using classroom as-
sessments (4th ed.). Upper Saddle River, NJ: Pearson
Education, Merrill.

Chapter 5 Exercises

Click here to complete Exercise 5.1: New Terms and Concepts

Exercise 5.2: Questions for Thought and Discussion— that “Software can grade essays thanks to improve-
These questions may be used for small group or class ments in artificial-intelligence techniques. Software
discussion or may be subjects for individual or group ac- has no emotional biases, either, and . . . machines
tivities. Take part in these discussions in your in-person have proved more fair and balanced in grading than
class meeting, or use your instructor-provided online dis- humans” (p. 4). What might be positive and negative
cussion area or blog. implications of this assessment practice for education
a. Wiggins (1993) was an early proponent of so-called and training?
“authentic assessment.” He felt that traditional meth-
ods of student assessment (e.g., multiple-choice tests)
cannot measure complex intellectual skills required Exercise 5.3: Design Project Activities and Assessment
in real-life situations and that use of such assess- Criteria—As you prepare assessments for your product
ments tends to narrow the curriculum to skills that for this course, use the following criterion checklist to
can be tested in this way. He said, “We should be assess your work:
seeking a more robust and authentic construct of _____ 1. The assessment activity, criteria, and grading
‘understanding’ and a more rigorous validation of strategy in the assessment are closely matched to
tests against that construct. . . the aim of education the instructional design objectives in each case.
is to help the individual become a competent intel- _____ 2. Each instrument meets all essential criteria for
lectual performer, not a passive ‘selector’ of orthodox that type of assessment.
and prefabricated answers” (p. 202). Are there situa- _____ 3. An appropriate testing format has been selected
tions for which “traditional” assessments are helpful? that is accessible to the students.
Are there situations in which “authentic assessments” _____ 4. The instruments have been put into their testing
are the better choice? formats (e.g., online, software package).
b. Young (2011) reports that allowing computers to _____ 5. If possible, the assessment has been subjected
grade students’ written work (e.g., essays) is becom- to an expert review and/or field test with
ing more common in higher education. He notes students to increase validity.

You might also like