Professional Documents
Culture Documents
4
Preparing Instructional
Design Objectives and
Assessment Strategies
Chapter 4 Topics
▪ The role of objectives in instruction and instructional design
▪ Essential characteristics and components of instructional design objectives
▪ How to decide on appropriate assessment formats for various types of objectives
▪ Procedures for writing effective instructional design objectives
▪ Common errors and problems in writing instructional design objectives
69
70 Part I • Analysis
SCENARIO
The essential role of objectives and assessments
Aubrey Fair was an instructional designer for a large training consultant firm. A manu-
facturing company had hired his firm to update training in antitrust laws that it re-
quired all its managers to take. It was imperative that the company’s managers knew
these laws well and didn’t inadvertently break any antitrust rules, because the com-
pany would be held responsible for any infractions. For the last few years, a workshop
was offered by the company’s lead attorney, who was an expert in antitrust laws.
Aubrey was not told what needed to be updated; he was simply instructed to begin by
meeting with the attorney.
Aubrey greeted the attorney cordially. “So how long have you been offering
these antitrust workshops?” he asked. “For about two-and-a-half years,” replied the
attorney stiffly, “and, frankly, I don’t see why we need to change them at all. They’ve
been working just fine up to now.” Aubrey immediately sensed the need to tread care-
fully. He was obviously invading the attorney’s domain, but he needed to know how
the training had been held in the past.
“Yes, I’ve heard you’re the company’s legal expert, and I’m very interested in
the approach you’ve been using in your workshops,” said Aubrey amiably. “Can you
share some of your materials with me? I’m especially interested in the objectives of
the workshop.”
“They’re quite straightforward, as you can see,” said the attorney who handed
him a notebook. “This is my instructor manual and handouts.” Aubrey read a few of
the statements on the list labeled “Workshop Objectives.” They included:
• Review the definition of “antitrust” as reflected in the basic laws.
• Review the purposes and main points of each of the laws.
• Give students an appreciation for the purposes of antitrust laws in business.
• etc.
Aubrey said, “Hmmm, I see. How do you tell if the workshop participants learn
what you have in mind? Do you have tests or assessments to measure what they’ve
learned?”
The attorney said, “Oh, yes, I can always tell they really get it. There are always
a lot of good questions, and everyone is very enthusiastic about the content. I always
have one or two come up afterward to shake my hand and tell me they’re glad they
attended. These are high-level guys, though, and I feel they would find tests demean-
ing. We do a debriefing at the end and go through a checklist together to make sure
they know everything they should. It really works well.” Aubrey thanked the attorney
and asked to take the notebook with him to look over.
organizations never created actual assessments linked to these objectives or made sure
that instruction was in place to help bring about the outcomes they specified. There-
fore, the most important role of instructional objectives was never served. If objectives
are to be most useful in improving instruction, they are not ends in themselves, but
rather the first of in a series of carefully linked design activities.
STANDARDS VS. OBJECTIVES. In the last decade, content standards are one kind
of performance “target” that has become increasingly well known and important
in education and training. For example in the United States, every state has a
set of standards for what students are to learn in each content area. In addition,
Common Core Standards have been created by the National Governors Association
Center for Best Practices and the Council of Chief State School Officers (http://
www.corestandards.org). At this time, 45 states and the District of Columbia, four
territories, and the Department of Defense Education Activity have adopted the
Common Core Standards. While these are definitely statements of what students
should be able to do after instruction, they are more global in nature than those
required for instructional design purposes. For example, look at the following com-
parison between one of the Common Core Standards and three different objectives
that might be designed to measure achievement of that standard. In Figure 4.1, see
how a single standard can be assessed in many different ways with different actions
and criteria for meeting it.
TERMS FOR OBJECTIVES. Various terms have been used to describe the behaviors
s tudents should be able to do as the result of instruction. These include: behavioral
objectives, instructional objectives, objectives, outcomes, outcome-oriented objec-
tives, and performance objectives. However, all of these terms are used in contexts
other than systematic instructional design, and the meaning becomes clear only if
the reader knows the context and purpose for which they are being used. The term
instructional design objective is used in this design model to clarify that it is the prod-
uct of this instructional design step: a statement of behaviors and assessment criteria
that instructional designers write to specify what learners should be able to achieve as
a result of the instruction. This term also helps differentiate statements of objectives
that are useful for design purposes from those given to students or stated in textbooks,
because the latter may not be as detailed or stated in the same way as those needed
to drive instructional design.
Common Core
Standard for Grade 4
Language Arts Measureable Performance Objectives for the Standard
L.4.5 Explain the meaning 1. In at least 8 of 10 sentences that each contain an underlined simile
of simple similes and met- or metaphor, write below the sentence a synonym for the figure of
aphors (e.g., as pretty as a speech.
picture) in context. 2. Given a 6- to 8-sentence paragraph containing a total of 2 similes
and 2 metaphors and a list of meanings for them below the
paragraph, circle all 4 figures of speech and write each beside its
correct meaning.
3. In 10 short poems, 5 of which contain a simile and 5 of which contain
a metaphor, identify the figure of speech correctly in at least four of
each set by circling it and writing it s meaning below the poem.
ESSENTIAL CHARACTERISTICS. No matter how they are stated, instructional design ob-
jectives should reflect certain qualities. First, there should always be an observable
action of some kind (e.g., write, create) rather than just an internal ability (e.g., under-
stand, know, learn) or a statement of content to be covered (e.g., review three chap-
ters). Second, the focus should always be on the actions of students after instruction,
74 Part I • Analysis
rather than those of the teacher or student during instruction. Finally, statements
should be so unambiguous that anyone reading them should know exactly what stu-
dents are to do to show they have learned. It is not necessary to state an objective in
only one sentence. Clarity and specificity are the most important qualities for instruc-
tional design objectives, and achieving these qualities may require several sentences
or a series of phrases.
Chapter 4 • Preparing Instructional Design Objectives and Assessment Strategies 75
ESSENTIAL COMPONENTS. Objective statements are most helpful for design purposes
when they have certain components. At the minimum, each statement should contain
three items to specify how the student will demonstrate what they have learned: ac-
tion, assessment, and performance level.
• Action. The action the student is required to do is derived from the behavior
identified in the learning map, which designers create in the step before writ-
ing objectives. (See Chapter 3.) For example, one of the outcomes in a 3-D
Drawing Sample Project learning map in Chapter 3 is “Complete 3-D drawing
model.” The obvious action that would demonstrate knowledge of drawing
principles is: “Draw a model.” Actions should always be expressed as observ-
able activities; for example, “design, write, solve, draw, make, choose.” Avoid
action verbs that describe internal conditions that cannot be directly seen and
measured. Examples of these “verbs to avoid” are: understand, know, appreci-
ate, and feel.
• Assessment. The designer must identify the circumstances under which the
student will complete the action. This may include methods, student/instructor
materials, and/or special circumstances that will apply as students show what
they have learned. Many objectives do not require that all four of the following
components be specified in order to make an objective clear enough for design
purposes; it depends on the type of learned behavior and what the designer con-
siders necessary for a valid assessment.
– Methods. The objective should identify the means of assessing the action.
Completing a test or survey, doing a verbal description, performing an activity,
or developing a product all are possible assessment methods. (For details on
assessment method options, see the following section on Essential Criteria for
Selecting Appropriate Assessment Methods.)
– Student materials. Assessment may require that students have additional
materials such as data charts and tables, calculators, dictionaries, or textbooks
available to them. If so, the objective should state them.
– Instructor materials. Materials such as a rubric or a performance checklist
may be required so that instructors can rate or track performance. A rubric is
a scoring guide, and a performance checklist is a list of component tasks or
activities in a performance. (Both will be discussed in more detail in Chapter 5.)
For example, if students do web page layouts, the products might be judged
by a rubric or criterion checklist.
– Special circumstances. Sometimes the objective must include a description
of certain conditions in which the assessment will be done. For example, stu-
dents must do an activity within a certain time limit or without any supporting
materials.
• Performance level. Perhaps the most difficult part of writing an objective is
specifying how well a student will have to do an activity or how much they must
do it to show they have the necessary level of expertise. Designers must decide
what will constitute acceptable performance and specify it. Depending on the
assessment method, there are several ways to express acceptable performance
levels.
– Number correct. Students may need to do a certain number of items or ac-
tivities correctly to demonstrate they have learned. If the assessment method
is a written test, the percentage or number of items required for passing the
test should be stated. If the action is a motor skill such as operating a piece
of equipment, the students may need to do it correctly a certain number of
times.
76 Part I • Analysis
– Level of accuracy. If the designer knows there will be variation in the action,
the tolerance for this variation should be specified. For example, if an architec-
tural student is required to calculate the weight a structure will bear, a tolerance
range in pounds or ounces must be stated.
– Rating. If the quality of performances or products is measured by a rubric or
checklist, the acceptable rating must be given. For example, in the web page
example, if a rubric is used to assess the quality of the student’s web page de-
sign, the designer would have to specify what would constitute an acceptable
rubric score. If students are to complete a series of activities, the rating may be
how many of the total number they must complete.
See Table 4.2 for examples of objectives that reflect all these components.
1
Target Behavior 2 3 4
from Learning Map Action Assessment Performance Level
Example 1 The student identifies The student labels The student labels a sample page The student must
examples of text, elements of a web printout randomly selected by correctly label 14 of
images, links, and page. the Instructor from 10 printouts. 15 elements.
tables on a web page. On each page, 15 elements are
indicated with an arrow and a
numbered line. The student must
label all parts within 10 minutes.
(Spelling does not count.)
Example 2 The student classifies The student The paragraph on a computer At least 14 of the
sentences as identifies sentences screen has 15 sentences 15 must be correctly
simple, compound, in a paragraph as with at least 2 of each type coded.
complex, or to type. represented. The instructor assigns
compound–complex. a color code for each type. Student
codes all 10 of the sentences within
10 minutes.
Example 3 The student The student creates On an AutoCAD screen, the The roof drawing
demonstrates the a CAD drawing of a student draws a roof with the must meet at least
procedure for using structure with 3-D correct size and shape within ten 9 out of 10 accuracy
AutoCAD to create a planes. minutes and with no reference and quality criteria
3-D plane in space. materials. The instructor grades on the instructor
with a checklist. checklist.
Example 5 The student states The student On a computer-screen image of Passing score is at
names for all bones labels bones on a the upper extremity, students enter least 61 of 64.
of the shoulder, wrist, computer-generated the name of the bone on the line
and hand. image of the opposite it, all within 30 minutes,
skeleton. using no reference materials;
spelling counts.
Example 6 The student executes The student types The student uses Microsoft Word The paragraph must
a typing exercise at a paragraph at software to type an assigned contain no more than
60 WPM. 60 WPM. paragraph. If needed, the three typographical
instructor will assist with setting up errors.
a new Word document. Students
will be given the paragraph on
paper and a verbal signal to begin
and end the test.
Chapter 4 • Preparing Instructional Design Objectives and Assessment Strategies 77
MENTAL SKILLS AND INFORMATION TESTS. In recent years, test formats called mental
skills and information tests (or simply tests) long used in education and training (e.g.,
multiple-choice tests) have come under various kinds of criticism. These are instru-
ments consisting of individual items that are intended as indirect measures of student
abilities. Some educators feel the instruments are overused and are valid measures
of learning primarily for lower-level skills. However, tests remain the most com-
monly used assessments in education and training, and many educators feel that, when
properly applied and developed, they can effectively assess learning at many different
levels. Although most of these methods require a relatively simple external response
from the student, they can require a complex internal process. For example, the multiple-
choice example in Table 4.3 requires only that the student read the item and circle
a choice. However, in order to get the item correct, the student must first solve
a complex problem. Another criticism of true/false, multiple-choice, and matching
formats is that students can get some correct by guessing. However, several tech-
niques are used to address this potential problem. For example, in a multiple choice
78 Part I • Analysis
test, designers may require a number of correct items and can provide carefully
crafted wrong answers or distractors based on answers that can result from incorrect
processes.
ATTITUDE SURVEYS. When the objective of the instruction is to change students’ per-
ceptions or behavior, Likert scales or semantic differentials ask them how they feel
about a topic or what they would do in a given situation. Likert scales are assessments
that ask the degree to which one agrees with statements, and semantic differentials
are assessments that ask where one’s views of something fall between a set of bipolar
adjectives. (Both will be discussed in more depth in Chapter 5.) Of course, we can
never be certain that what students say they will do on attitude measures is what they
actually will do. For example, a survey found a disconnect between what students say
they want to eat and what university food-service managers observed them choosing
to eat. The students said they wanted to eat healthy food like salads and fruit; how-
ever, the most popular foods were pizza and hamburgers (Farrell, 2002). Because most
actions cannot be observed so directly, attitude measures remain the most useful ways
to infer students’ likely performance and, thus, indicate that the instruction has had
the desired impact.
this is not feasible, you must choose a less direct method: asking them ques-
tions about what they will do in the future. In circumstances where there are
many learners to assess and time is an important factor, most assessments must
be indirect measures. However, the idea is to choose the most direct way that
is also logistically feasible to carry out in the setting in which instruction will
take place. When confronted with more than one way to assess individuals in-
directly (e.g., a matching versus a multiple-choice test), choose the one that is
the most direct measure of the performance learners would do in “real-world”
environments.
• Guideline #2: Resources required to establish reliability and validity.
Designers must also make decisions based on their estimates of time and person-
nel resources it will take to make sure instruments are valid and reliable. Validity
means an assessment method measures what it is supposed to measure (Gay,
Mills, & Airasian, 2009; Oosterhof, 2009). Reliability means an assessment yields
consistent results over time, over items within the test, or over two or more scor-
ers. (Also see Chapter 5 for a more in-depth discussion of validity and reliability
when developing each type of instrument.)
– Validity. For designers, validity means that an assessment should be
closely matched to the action stated in the objective. To increase validity,
designers try to select an assessment format that requires as little inference
as possible about whether students can actually do the action whenever they
are required to do it. For example, if the objective calls for students to solve
given algebra problems, a mental skills test that requires them to solve sam-
ple problems and indicate answers would be an appropriate way to infer stu-
dents’ skills in solving any and all such problems. However, if the objective
requires students to demonstrate they can analyze real-world situations and
develop complex solutions that require algebra skills, scenario-based prob-
lem solving evaluated by a performance measure such as a rubric or checklist
80 Part I • Analysis
Type of
Category Method Description Sample Action Sample Item
Mental Multiple choice Questions or “stems” with Identify correct answers 1. Which of the points listed
Skill and three to five alternative to geometry problems. below is on a circle with
Information answers provided for each. the following equation?
Tests Students select the most (x − 7)2 + (y + 3)2 = 25?
correct answer by circling or A. (10, 1)
writing the number or letter B. (17, 12)
of their choice C. (−8, −23)
D. (5, −6)
True/false or Statements that the student Identify whether or not Tell whether or not each of
yes/no must decide are accurate or something is a prime the following numbers is a
not and write or circle true number. prime number by circling T if
or false or a similar indicator it is and F if not:
(e.g., yes/no, correct/incorrect, T F 1. 92
right/wrong, plus/minus) T F 2. 650
Fill in the blank Statements that each have a Analyze a sales report The report reflects that the
(completion) word or phrase omitted that to determine important company’s best customer in
the student must insert items of information. the first half of the year
was _____.
Short answer A set of questions, each of Identify the German verb Wie _____ es Ihnen? (gehen)
which the student answers form that is appropriate
with a word or brief phrase. for each sentence.
Matching Two sets of related items; the Identify the area of the List of materials and list of
student connects them by library where a given library areas.
writing one beside the other item may be found.
or writing one’s letter beside
the other’s number.
Performance Essay (usually A statement or question Describe an instance of Give an example of an
Measures assessed by that requires a structured when the constructivist instructional objective
rubric; see but open-ended response; teaching technique for which a constructivist
description students write several would be an appropriate teaching technique would
below under paragraphs or pages. choice and describe the be appropriate, describe
Performance strategy that would be the technique, and give
Measures) appropriate for that three reasons it would be
situation. appropriate for the objective.
(Graded by an attached rubric.)
Procedures A list of steps or activities Demonstrate the ______ 1. Turn on the
checklist students must complete procedure for using a camera.
successfully. digital camera to take a ______ 2. Adjust the
photo. settings, etc.
Performance or A list of criteria that Develop a multimedia An example item for a
product rating students’ products or presentation that meets multimedia product:
scale performances must meet. all criteria for content, Scale
Each criterion may be instructional design, 3 = High, 2 = Acceptable
judged by a “yes/no” organization/ navigation, 1 = Unacceptable
standard or by a level of appearance, and _____ All content information
quality (e.g., 1, 2, or 3; low, graphics/sound. is current.
medium, high) _____ All information is
factually accurate,
etc.
(Continued )
82 Part I • Analysis
Table 4.3 (Continued)
Type of
Category Method Description Sample Action Sample Item
Performance A set of elements that Develop a PowerPoint See examples at Kathy
or product describe a performance presentation to present Schrock’s Guide to Everything
rubric or product together with research findings. website: http://www
a scale (e.g., one to five .schrockguide.net/assessment-
points) based on levels of and-rubrics.html
quality for each element.
Attitude Likert scale A set of statements, and Demonstrate a I am likely to use the Hotline
Measures students must indicate a willingness to use the when I am faced with a
level of agreement for each company’s Information possible case of employee
set. Hotline to ascertain theft. Circle your choice:
company policy and SA A U D SD
procedure on important
personnel issues.
Semantic Sets of bipolar adjectives, Demonstrate a positive When I think about working
differential each of which may describe attitude toward working with people from a culture
an item, person, or activity; with people of many other than my own, I feel:
each pair is separated by cultures. Good _ _ _ _ _ Bad
a set of lines or numbers; Happy _ _ _ _ _ Sad
students mark one to etc.
indicate a level of feeling on
the continuum from one to
the other.
the following step-by-step approach, breaking down each objective into distinct com-
ponents and writing each one before going back to refine each statement into a final
objective. This forces them to consider each component carefully, focusing on the es-
sential attributes of each one. However, if you are more comfortable working outside
a table, you may do that.
• Review the learning map. In the Instructional Analysis step, you prepared a
learning map, analyzed learner needs, grouped the behaviors on the map into
learning segments each with a behavior to be measured, and decided on a se-
quence for teaching the segments. Now you should review the skills or steps that
lead up to learning and/or doing the behaviors. Some or all of these behaviors
will become an instructional design objective.
• List the target behaviors. The first step in writing objectives for each segment
is either to create a table similar to the one in Table 4.2 and enter the target be-
haviors into the first column, or to simply make a list of the target behaviors.
• Decide on an action, assessment method, and performance level to dem-
onstrate the first behavior. After deciding on the most direct way to assess
that the learner can do the behavior and carefully considering validity, reliability,
instrument preparation time, and administration and scoring logistics, decide on
assessment and performance level components for the first objective. Enter it
into a table or write it next to the behavior.
• Create the objective statement. After completing the components of each ob-
jective, go back and review each objective and make any corrections necessary to
make it into a final statement. Finally, write completed statements of the objectives.
• Repeat the process for the other objectives. As you write the statements, you
may realize that some behaviors can be combined into one objective. If neces-
sary, rewrite the statements to reflect the combined behaviors.
enough different examples to confirm they can identify any and all sentences as
fact or opinion.
– Incorrect assessment. Select an example of fact and opinion. Give students
a newspaper story written at their grade level with all sentences numbered.
Under the paragraph, they must write the number of one sentence that is fact
and one that is opinion.
– Correct assessment. Select an example of fact and opinion. Students are
given a newspaper story written at their grade level with all sentences num-
bered. Under the paragraph, they must write the numbers of five sentences
that are fact and five that are opinion. All 10 must be correctly labeled.
• The performance level criterion is not appropriate for the type of action
and/or the assessment. The “accuracy” criterion relates to amounts and num-
bers (e.g., all items on a test are correct), but the action does not have items; it
must be assessed by requiring certain steps.
– Incorrect performance level. Develop a plan for taking care of a given plant
in a way that will ensure it survives. The plan must be done with 100 percent
accuracy.
– Correct performance level. Develop a plan for taking care of a given plant
in a way that will ensure it survives. The plan must reflect appropriate ways to
address each of the five care criteria.
• The performance level criterion is not realistic; it leaves no room for error.
Because readings from a temperature probe are likely to fluctuate, demanding
exact readings is not realistic.
– Incorrect performance level. Use a graphing calculator and temperature
probe to take readings of liquids. Readings of graph output must be exact.
– Correct performance level. Use a graphing calculator and temperature
probe to take readings of liquids. Readings of the graph output must be cor-
rect within a range of ± .01.
Chapter 4 Summary
• Objectives can serve several kinds of useful in- establish validity and reliability; and logistics re-
structional roles (e.g., guides for reading, targets quired for instrument development, administra-
for students), but objectives for instructional de- tion, and scoring.
sign purposes are written to make sure required • Procedures for writing instructional design objec-
postinstruction performances align with assess- tives include: reviewing behaviors in the learn-
ments and instruction. Objectives also differ from ing map; listing the target behaviors; deciding on
content area standards; more than one objective an action, assessment method, and performance
may be needed to measure a standard. level to demonstrate the first behavior; creating
• Clarity and specificity are essential qualities for the objective statement; and repeating the process
instructional design objectives. All such objectives for each of the other behaviors.
must be in terms of what students will be able to • Common errors and problems in writing objec-
do and must specify the desired action the stu- tives include: the action is too vague to be mea-
dent will do post-instruction, as well as the assess- sured; action focuses on the instructor rather than
ment conditions and circumstances under which the student; the action focuses on the students’
they must do it and the performance criterion learning activities rather than postinstruction ac-
they must meet (e.g., number of items correct or tivities; the action and/or assessment information
level of accuracy). are incomplete; the assessment does not match
• Types of assessment methods include: mental the required action; the assessment does not spec-
skills and information tests (e.g., multiple choice, ify how the action will be measured; the assess-
true/false, fill-in-the-blank, matching, short an- ment does not require enough to confirm ability;
swer, essay), performance measures (graded the performance level criterion is not appropriate
by checklists and rubrics), and attitude surveys. for the type of action and/or the assessment; and
Guidelines for selecting the most appropriate for- the performance level criterion is not realistic be-
mat include: directness of measure as a reflection cause it leaves no room for error.
of real-world performance; resources required to
References
Farrell, E. (2002, July 12). Students won’t give up their French Popham, J. (2011). Classroom assessment: What teachers
fries. The Chronicle of Higher Education. Retrieved from need to know (6th ed.). Boston, MA: Allyn & Bacon.
http://chronicle.com/weekly/v48/i44/44a03501.htm Seels, B., & Glasgow, Z. (1998). Making instructional
Gagné, R., & Briggs, L. J. (1974). Principles of instructional design decisions (2nd ed.). Upper Saddle River, NJ:
design. New York, NY: Holt, Rinehart, & Winston. Merrill, Prentice Hall.
Gay, L. R., Mills, G. E., & Airasian, P. (2009). Educational Waugh, C., & Gronlund, N. (2013). Assessment of student
research: Competencies for analysis and application achievement (10th ed.). Upper Saddle River, NJ: Merrill,
(9th ed.). Upper Saddle River, NJ: Pearson Education, Prentice Hall.
Merrill/Prentice Hall. Willis, J. (1995). A recursive, reflexive instructional design
Mager, R. (1962). Preparing instructional objectives. model based on constructivist-interpretivist theory.
Belmont, CA: Fearon. Educational Technology, 35(6), 5–23.
Oosterhof, A. (2009). Developing and using classroom as-
sessments (4th ed.). Upper Saddle River, NJ: Pearson
Education, Merrill.
88 Part I • Analysis
Chapter 4 Exercises
Exercise 4.2: Questions for Thought and Discussion— Exercise 4.3: Design Project Activities and Assessment
These questions may be used for small group or class Criteria—As you prepare instructional design objectives
discussion or may be subjects for individual or group ac- for your product for this course, use the following criterion
tivities. Take part in these discussions in your in-person checklist to assess your work:
class meeting, or use your instructor-provided online dis- _____ 1. Instructional design objectives have been pre-
cussion area or blog. pared to cover all skills from the learning map
a. Willis (1995) says that “In the R2D2 (design) model, that will be included in the instruction.
specific objectives evolve naturally from the process _____ 2. For each objective, all three required compo-
of design and development . . . it is not important to nents are specified.
write specific objectives at the beginning of a (design) _____ 3. For each objective, the action is in terms of stu-
project.” Why does the approach that Willis recom- dent performance.
mends not work for systematic design models? Can _____ 4. For each objective, the assessment method will
you think of any design situations where the R2D2 be a valid, reliable, and practical way to confirm
model would be appropriate? that students have learned the action.
b. Popham (2011) notes that the standards currently be- _____ 5. For each objective, the performance level is a
ing offered by various content areas (e.g., science, reasonable requirement to demonstrate that stu-
mathematics, history) and by various state depart- dents have achieved the ability specified in the
ments can be very helpful to those selecting objec- objective.
tives to assess in schools. Give an example from
your chosen content area for how standards relate to
instructional design objectives.
Chapter
5
Developing Assessment
Materials
Chapter 5 Topics
▪ The purpose and roles of assessments in instruction and instructional design
▪ Types of assessments and purposes of each type
▪ Essential characteristics and components of assessments
▪ How to create various kinds of assessment instruments
▪ Common errors and problems in designing assessments
89
90 Part II • Design, Development, and Implementation
SCENARIO
Matching up testing, assessment, and evaluation
Wiley, a seventh-grade science teacher, was at a neighborhood party talking with
his friend Matt, a local businessman, who was decrying “the sorry state of education
today.”
“I don’t envy your job, Wiley,” said Matt. “Did you see that newspaper article
yesterday about how many of our kids can’t meet the standards? What is causing all
these problems?” he asked, shaking his head. “These kids today are just hopeless.
I know you teachers work hard and do all you can, so I just don’t understand why our
kids can’t pass those tests. Our economy depends on having well-educated citizens
coming out of schools, and it looks like not many of them will be.”
Wiley cocked an eyebrow and said, “You know, I think your evaluation of the
situation is too pessimistic; things are not as bad as the stories would have you believe.
My kids do just fine on my tests, and I know many of them do great on the state’s
required science tests. I keep a pretty close eye on what happens to my students both
during my classes and afterwards, just so I can improve on something I teach if I need
to. I just feel that these high-stakes tests aren’t the whole answer, and they don’t
always tell what kids really know about a subject.”
Matt was incredulous. “What do you mean by that?” he asked. “Aren’t the tests
matched to state standards? Meeting standards is really important, isn’t it?”
“Yes, of course it is,” said Wiley, “But two things give me pause about those
tests. The first is that they only test kids in one way, usually a long multiple-choice
test. I use a lot of shorter assessments in different formats and at various times in a
course, depending on the kind of learning. Sometimes I have my kids do a lab, and
I use checklists and rubrics to assess how well they do. When I want to see if a kid is
following a particularly difficult concept, I might ask them to explain it to me verbally,
as if they were teaching me. They love that! It also helps me get them back on track if
it becomes clear they really didn’t get it the first time.”
“The other thing,” Wiley continued, “is that those tests put a lot of pressure on the
kids to tell everything they know all at one time.” He laughed, “Sometimes I think all
they’re testing that way is a kid’s stamina! When I give a major test, and it’s a bad day for
a kid for whatever reason—everyone can have a bad day—I give them another chance
to pass it. The state doesn’t have the resources to do any of the things I do. So I can’t
help but wonder if they’re measuring the kids’ knowledge of standards as well as I do.”
“I don’t know, Wiley,” said Matt skeptically. “I think you’re too soft on these kids.
It was a lot harder when I went to school. We kids either passed those tests or we
dropped out and went to work, you know?”
Wiley patted his friend’s shoulder reassuringly. “Yes, I know, Matt,” he said smiling.
“Fortunately, times have changed.”
• Offer a valid and reliable way to measure how much learning has
occurred. Instructors should be able to use the instruments with confidence
that they measure what they are intended to measure and that they do so consis-
tently across students and across time.
• Be closely matched to instructional design objectives. The objectives offer
specifications for what should be measured and how students should be asked
to demonstrate they have learned.
• Be designed before instruction is designed and developed. Finally, the
instruments should be created so that instruction is matched to them, and not the
other way around. This is designed to confirm that instruction is successful in
bringing about desired changes in behaviors. It provides the guidelines for judging
the success of both students and the quality of the instruction itself.
numbers cannot learn long division. Therefore, whether or not students are able to
learn a given objective depends on whether they have learned the knowledge and
skills that are prerequisite to it. The first kind of assessment occurs before instruction
begins and helps teachers determine if students have entry behaviors or skills, which
they need to learn the new objectives but which will not be included in instruction.
The objectives on which this test is based come from the entry behaviors or skills
part of the learning map, which you learned about in Chapter 3. Depending on the
circumstances and how much time is available, this assessment can be formal or infor-
mal, written or verbal. But the implications are so important that the person doing the
assessment should make sure that results provide an accurate enough measure of
what the student knows.
The decision to be made as a result of this assessment is whether or not the stu-
dent knows enough to be able to learn successfully from the planned instruction. If
it becomes clear that one or more students lack some or all entry behaviors or skills,
the decision is likely to give remedial instruction until the students can show they
are ready to proceed to the next step. Note that the purpose of this prerequisite skills
test is different than a pretest. Rather than being outside the goals of the instruction,
pretests cover exactly the same content as the instruction. In most instructional situa-
tions, pretests are also used as diagnostic tests to determine how much of what is to be
taught the student already knows. Thus, they allow teachers and trainers to determine
if certain students do not need certain parts of the instruction and plan accordingly.
Though instructors are rarely tasked with demonstrating how much learning occurred
as a result of instruction, pretests and posttests serve this role in research studies and
in summative evaluations (see Chapter 10).
ASSESSING LEARNED KNOWLEDGE AND SKILLS. The last type of assessment is the one
most people think of when they think of assessment. It is administered at the end of
a unit or the end of a course. This activity usually results in a decision on what grade
will be awarded and/or whether or not students showed they mastered enough of the
content to receive credit for it. So-called “high-stakes tests,” mentioned in the opening
scenario, are examples of these assessments but are different from most other end-
of-instruction assessments in several ways. First, they are almost always timed and
their administration is standardized so that all students take them under similar
circumstances. This is intended to give all students the same opportunity to show what
they know and, therefore, be more “fair.” Second, they are almost always the type
of tests that can be scored quickly, usually by computer. Finally, the decisions they
Chapter 5 • Developing Assessment Materials 93
enable affect students, teachers, and school systems, and data from them may drive
system-wide changes to instructional approaches. Thus, they have far more impact
than most assessments.
Of the following statements, which best describes the overall purpose that assessments
serve? Circle the letter of the correct answer.
A. Assign numbers in order to quantify or score student performance during instruction
B. Provide a process to gather evidence of student learning to inform instructional
decisions
C. Provide a value judgment of the usefulness or worth of instruction or teaching quality
Place a check by three of the following that are roles assessments should play in instructional
design:
______ 1. Give a valid measure of teaching quality
______ 2. Serve as ways to practice and diagnose
______ 3. Assess learned knowledge and skills
______ 4. Provide standardized measures of performance
______ 5. Identify prerequisite entry knowledge and skills
assessment experts. Waugh and Gronlund (2013) say that it is “the appearance
of being valid” (p. 42). Gay et al. (2009) say that face validity is “the degree to
which a test appears to measure what it claims to measure” (p. 154). They also
note it is sometimes used to mean content validity, which they define as “the
degree to which a test measures an intended content area” (p. 155). Though they
say that a check for face validity offers no sound way of determining an assess-
ment’s value, it is sometimes an initial screening step that should be followed by
more formal steps to validate content. Expert review may be seen as a kind of
face validity and content validity check. Gay et al. (2009) also note that sampling
validity is a kind of content validity that has to do with how well an assessment
represents knowledge or skills in the entire content area, rather than just a part
of it. To establish sampling validity, designers must try to make sure they have
enough items from various parts of the content to make a good measure of the
entire area.
Other characteristics of assessments that can affect validity have to do with
suitability for the students who will use them. For example, tests must be in the
language and at the reading level of students who will use them (unless their
purpose is to measure reading level in a given language), because they can-
not measure anything if students cannot understand what they ask. Also, the
language used in assessments must be compatible with students’ cultural back-
grounds. Famous examples of cultural incompatibility in assessments were items
from early forms of standardized achievement tests that referred to concepts with
which students were often not familiar, such as using “lawns” and “pineapples”
with urban students who had never heard of these things, let alone seen them.
• Reliability. An assessment is reliable if it yields consistent results over time,
over items within the test, and over test scorers. Testing experts look at many dif-
ferent ways of establishing reliability, including internal consistency, and test-
retest reliability. Statistical tests can help establish these qualities. A Cronbach
alpha, Spearman-Brown, or Guttman’s split-half reliability test can help measure
Chapter 5 • Developing Assessment Materials 95
internal consistency (Huck, 2012; Oosterhof, 2009), or the degree to which items
designed to measure the same thing within a test are able to produce similar
scores; and a Pearson r correlation checks for test-retest reliability, or when a
student gets similar scores in two successive administrations of the same test. But
sometimes, it is not practical to do them or instructional designers may not have
the expertise. Rather, they may rely on expert review to estimate these qualities.
However, with instruments such as rubrics, which are instruments designed to
measure complex behaviors such as writing by describing each of several levels
of behavior on its elements (see Essential Characteristics of Rubrics, later in this
chapter), and which require even more subjective judgments, designers may
want to establish inter-rater reliability (Gay et al., 2009), or the quality an instru-
ment exhibits when two or more persons scoring products with the same instru-
ment are likely to get the same score. In lieu of doing a statistical test such as a
Pearson r or a Kendall’s coefficient of concordance (Huck, 2012), designers may
decide to estimate reliability by having the designer and another person familiar
with the content and the products being assessed score the same student samples
and examine them to see if their scoring tends to result in a similar measurement.
If they do not, it may be that the instrument is communicating differently to dif-
ferent experts. Sometimes this can be corrected with clearer wording, but when
more than one rater is scoring products or performances with a rubric, training
raters to interpret the instrument in the way the designer intended is necessary
to ensure inter-rater reliability.
Each type of mental skills and information assessment also has its own es-
sential characteristics. Types of assessments and the essential criteria for each type
are reviewed next. Table 5.2 summarizes these types and gives the essential criteria
for each.
TABLE 5.2 Types of Assessment Instruments and Essential Criteria for Each
example, students are to identify examples and nonexamples of something (e.g., stable
versus unstable chemical compounds). One format that can include many different
items in one format is giving a paragraph with a number of underlined words, and
asking the student if each is or is not correct. For example, the student is asked if the
underlined words are or are not examples of a verb.
development. They provide a basis for grading by supplying a matrix that describes
what the product looks like at various levels of quality (e.g., poor, acceptable, good,
excellent) on each of several elements or dimensions (e.g., clarity, grammar, mechan-
ics, and organization, and vocabulary). Rubrics came into use in the 1970s for grad-
ing writing samples, but became increasingly popular when alternative assessments
to traditional testing came into use. Creating an effective rubric requires the designer
to identify specific qualities that make the performance acceptable and describe how
these qualities look at various levels of performance.
RUBRIC GENERATORS. Several Internet sites offer free rubric generators. The designer
follows a set of prompts, and then the system creates a rubric that can be printed
out; some may be referenced as an online location (Roblyer & Doering, 2013, p. 150).
Popular rubric generation sites are RubiStar and Rubric Maker.
a range of tools that usually include a set of handheld devices and software that per-
mits a group of students to answer the same question simultaneously, analyzes the
responses, and displays them in summary form for the teacher (Roblyer & Doering,
2013). These tools are especially helpful for displaying embedded practice items and
are often used in conjunction with interactive whiteboards (IWB), or devices that
include a display screen connected to a computer and digital projector. Now that many
students have smartphones that can be used in place of clicker systems to interact with
IWBs, it has become even easier for teachers to insert embedded items for practice or
to diagnose problems.
ONLINE SURVEY SITES. These online tools originally came into common use to gather
survey data, primarily for research purposes, but they may also be used to offer vari-
ous kinds of assessments, including attitude measures and achievement measures.
Online survey tools allow designers to create and implement their own attitude sur-
veys and questionnaires, but the sites may also be used for giving tests online. The
sites provide features that make it quick and easy to design many kinds of items; the
most commonly used formats are multiple choice and Likert scale. After creating an
online instrument, the designer can e-mail an invitation that contains a link to the site,
and people in any location that has an Internet connection can fill in answers to the
items. The survey site automatically collects and organizes data, and the instrument
designer can request the system to display data in charts and graphs. For example, if a
designer wanted to see how people were responding on a given item, the online site
can display that information in a bar chart labeled with percentages of respondents
selecting each answer to each question.
Most online survey tools allow free use for a limited time, or for shorter in-
struments, but if designers want to be able to download a data file of responses,
they must usually pay a usage fee. Commonly used sites include SurveyMonkey and
Zoomerang.
Objective 3—Assessment Formats and Tools. Select a technology-enhanced assessment tool to meet each of several
assessment needs. Place the letter(s) of the assessment tool (listed on the right) on the line next to the descriptions of
assessment needs on the left.
Technology-Enhanced
Assessment Needs Assessment Tools
______ 1. For a training workshop in statistical analysis, a designer A. Computer-based test generators
wants to gather pretest data from the workshop participants B. Rubric generators
to determine what they already know. The designer wants to C. Computer-based testing software
have participants take a pretest from their home locations D. Student response systems (clickers)
before training begins, then display a chart of the results E. Online survey tools
on the first day of the workshop. F. Assessment features of content
______ 2. A designer is updating an in-person math workshop to help management systems (CMS)
young adults prepare for an exam to get the equivalent of a
high school diploma. She wants to intersperse embedded
items throughout one unit to make sure participants are
grasping key concepts. She wants to gather responses quickly
and be able to see data summaries instantly, so the instructor
will be able to provide additional help, when needed.
______ 3. As a designer creates a remedial mathematics unit for
community college students, he would like to include a
test-item bank the instructor can use to create several
different versions of the same test. This will allow students
to retake tests, when needed, without taking the same test
each time.
PROCEDURES FOR DESIGNING MENTAL SKILL AND INFORMATION TESTS. Steps for
designing multiple-choice, true/false or yes/no, matching, or short answer/fill in the
blank are generally the same. These steps include:
1. Review the instructional design objective and review all the content to be
sampled.
2. Review the decision that was made when instructional design objectives were
written about the number of items needed for each area of content. This deci-
sion has implications for sampling validity so, if necessary, adjust the number to
reflect better the entire area.
3. Draft the items. In the case of multiple-choice items, determine the most com-
mon errors students are likely to make in order to write good distractors. Make
sure items meet criteria specified earlier for the type of items.
4. Decide on the testing format (see the section in this chapter on Testing Formats
and Tools). Put the instrument in the desired format.
5. If possible, ask content area experts to review the instrument for: how well the
items reflect knowledge and skills, as specified by the objective; clarity of word-
ing; and accuracy of answers that the designer has identified as correct.
Chapter 5 • Developing Assessment Materials 103
6. If possible, field test the instrument with sample students and revise as needed
before implementing it with larger groups.
3. Select an appropriate scale for the instrument such as strongly agree, agree,
uncertain, disagree, and strongly disagree.
4. Write clear directions for completing the instrument.
5. Decide on the testing format (see the section in this chapter on Testing Formats
and Tools). Put the instrument in the desired format.
6. If possible, ask one or more content area experts to review the scale to make
sure the items are clear and tied closely to expected behaviors and beliefs.
7. If possible, field test the scale with sample students, check for internal consis-
tency, and revise as needed before implementing it with larger groups.
For semantic differential instruments, design steps include:
1. Decide on the stimulus word, phrase, or sentence. For example, if the designer is
trying to assess changes in attitudes toward mathematics, just the phrase “Doing
math” would suffice. If assessment addresses changes in attitudes toward people
of another culture, say “How do you feel when you think of people who have a
different skin color than yours?”
2. Decide on adjective pairs (e.g., good . . . bad; happy . . . sad) and put the instru-
ment in the format with three to seven lines in between the words.
3. Decide on the assessment delivery format.
Objective 4—Designing Assessment Instruments. For one or more of the following types
of assessment, design an example product that meets all required criteria and is appropriate
for the type of skill you want to assess. Be sure to include appropriate directions for using the
instrument. If you and your instructor agree, you can substitute similar or other instruments
for the ones listed below.
1. A ten-item true/false or yes/no test that identifies examples versus nonexamples of an
appropriate content item from your area of expertise (e.g., nouns versus not nouns,
mammals versus not mammals, scenario describing behaviors that are either correct or
not correct)
2. A performance checklist or rating scale for how to balance a checking account
3. A rubric for assessing a web page developed for a technology education class
4. A ten-item Likert scale for assessing attitudes toward gang membership
5. A ten-item multiple-choice test that assesses an appropriate skill from your area of
expertise
6
5 31
30
r 1
A. divisor
B. dividend
C. quotient
D. remainder
• In a multiple-choice item, a stem gives an unintended clue to the correct
answer. The word “heroine” gives a hint that the gender of the character is
female.
– Incorrect. Which of the following is Charlotte Bronte’s most famous
heroine?
– Correct. Which of the following is Charlotte Bronte’s most famous character?
A. Jane Eyre
B. Edward Rochester
C. Linton Heathcliff
D. William Crimsworth
• In a multiple-choice item, options give unintended clues as to the correct
answer. The correct option (B) is much longer and more detailed than incor-
rect options. Students may be drawn to it because it stands out.
– Incorrect. Which of these statements describes how “falsifiability” applies to
a hypothesis?
A. The hypothesis is obviously false.
B. If a hypothesis is false, then it is possible to demonstrate its falsehood.
C. The hypothesis may be false.
D. The hypothesis sounds false.
– Correct. Which of these statements describes how “falsifiability” applies to a
hypothesis?
A. The hypothesis has been previously demonstrated to be false.
B. If a hypothesis is false, then it can be demonstrated to be false.
C. It is possible that the hypothesis is false under certain conditions.
D. The hypothesis sounds false; therefore, it is likely to be false.
• In a multiple-choice item, distractors are not ones students are likely to
select. The distractors in the following list are diseases that many people know
from common knowledge (and/or common sense) are not airborne.
– Incorrect. Which of these is an example of an airborne disease?
A. Childhood diabetes
B. Chicken pox
C. High blood pressure
D. Athletes foot
– Correct. Which of these is an example of an airborne disease?
A. Typhoid
B. Chicken pox
C. Dengue fever
D. Malaria
106 Part II • Design, Development, and Implementation
• In a true/false or yes/no item, the item uses an absolute term that automati-
cally makes the item likely to be false. In the incorrect example, the word
“all” signals that the item is most likely false.
– Incorrect. During World War II, all French citizens initially supported the
Vichy regime.
– Correct. During World War II, the Vichy regime initially had popular support.
• In a matching item, the directions for how to complete it are not clear. The
directions in the incorrect item are vague as to which list contains the parts of
speech and which contains the examples; also, it is not clear how to show the
matches.
– Incorrect. Directions: Match the tools with their uses.
– Correct. Directions: Match each of the ten tools on the left with the function
it fulfills in electrical work, listed on the right. Write the letter of the tool func-
tion on the line next to the tool.
• In a fill-in-the-blank item, the missing word can be more than one
choice. In the incorrect item, the correct answer could be many different things
ranging from “state” to “popular vacation site.”
– Incorrect. Florida is an example of a/an _____.
– Correct. Florida is an example of a land mass called a/an _____.
• In an essay, the scope of the task and grading criteria are unclear. In the
incorrect example, directions do not specify how much the student should write
or limit what they should write about; also, it is not clear how the work will be
graded.
– Incorrect. Discuss the impact of Manifest Destiny. Do your best work.
– Correct. Explain the concept of Manifest Destiny and describe two expan-
sionist initiatives it was used to justify. Write a paragraph on the definition and
one on each initiatives (5 points for each paragraph = 15 points total).
• In a Likert scale, the scale does not provide a direct response to the item.
In the incorrect item below, the scale does not match the question.
– Incorrect. How likely are you to smoke cigarettes in the future?
Strongly Agree, Agree, Uncertain, Disagree, Strongly Disagree
– Correct. I am likely to smoke cigarettes in the future.
Strongly Agree, Agree, Uncertain, Disagree, Strongly Disagree
• In a rubric, performances described in cells are not sufficiently distinct
from each other or are trying to describe more than one element at a
time.
______ 3. Essay assignment: Compare the positions of the two U.S. parties during the 1972
presidential election. Grades are based on clarity and detail.
______ 4. True/false item: Social networks are always dangerous places for middle-school
kids.
______ 5. How likely are you to use company-accepted safety procedures you have learned
for situations when you are transferring stock on shelves in the store?
Strongly Agree, Agree, Unsure, Disagree, Strongly Disagree
Chapter 5 Summary
• Assessments may serve many different kinds of reviewing the skill and list elements or dimen-
purposes in instruction including: identifying sion that must be present; deciding what levels of
prerequisite entry knowledge and skills, practic- performance are needed and how they should be
ing and diagnosing performances, and assessing labeled; writing the description for each cell in the
learned knowledge and skills. rubric; assigning points and a grading scale; and
• Essential criteria for effective assessment instru- deciding on the testing format. For Likert scale in-
ments include both validity (i.e., face validity, con- struments, design steps include: listing behaviors
tent validity, and/or sampling validity) and reliability and/or beliefs to include; writing statements that
(internal consistency test-retest, and/or inter-rater). reflect each behavior or characteristic; selecting
However, each of the following kinds of assess- an appropriate scale; writing clear directions; and
ment instruments must also meet its own essential deciding on the testing format. For semantic dif-
criteria: multiple-choice tests, true/false or yes/no ferentials, design steps include: deciding on the
tests, matching tests, short answer and fill-in-the- stimulus word, phrase, or sentence; and deciding
blank tests, essay tests, checklists and rating scales, on adjective pairs. For all instruments, final proce-
rubrics, Likert scales, and semantic differentials. dures should include, if possible, asking content
• Several kinds of computer-based assessments area experts to review the instrument and field
tools are available to make development and use testing it with sample students.
of assessments more efficient. These include: • Common errors and problems in creating assess-
computer-based test generators, rubric genera- ment instruments include: in a multiple-choice
tors, computer-based testing software, student item, a stem is an incomplete statement or prob-
response systems (clickers), online survey sites, lem or gives an unintended clue to the correct
and assessment features of online content manage- answer; options give unintended clues as to the
ment systems (CMS). Also, designers may choose correct answer; and distractors are not ones
to use online sources of existing instruments. students are likely to select. In a true/false or
• Procedures for designing mental skill and infor- yes/no item, the item uses an absolute term that
mation tests include: reviewing the instructional automatically makes the item likely to be false. In
design objective and all the content to be sampled, a matching item, the directions for how to com-
reviewing previous decisions about the number of plete it are not clear. In a fill-in-the-blank item,
items needed for each area; drafting the items; the missing word can be more than one choice.
and deciding on the testing format. Procedures In an essay, the scope of the task and grading cri-
for designing performance checklists and rubrics teria are unclear. In a Likert scale, the scale does
include: reviewing the steps required to perform not provide a direct response to the item. In a
the skill; creating the checklist, assigning points rubric, performances described in cells are not
and designing the grading scale; and deciding on sufficiently distinct from each other or are trying
the testing format. For rubrics, procedures include: to describe more than one element at a time.
108 Part II • Design, Development, and Implementation
References
Bridgeman, B., Trapani, C., & Attali, Y. (2012). Comparison Popham, J. (2011). Classroom assessment: What teachers
of human and machine scoring of essays: Differences need to know (6th ed.). Boston, MA: Pearson, Allyn &
by gender, ethnicity, and country. Applied Measure- Bacon.
ment in Education, 25(1), 27–40. Roblyer, M. D., & Doering, A. (2013). Integrating educa-
Brookhart, S. (2013). How to create and use rubrics. Alex- tional technology into teaching (6th ed.). Boston, MA:
andria, VA: Association for Supervision & Curriculum Pearson, Allyn & Bacon.
Development (ASCD). Stiggins, R., & Chappuis, J. (2012). An introduction to
Frisbie, D. (1992). The multiple true-false item format: A student-involved assessment for learning (6th ed.).
status review. Educational Measurement: Issues and Upper Saddle River, NJ: Pearson Education, Merrill/
Practice, 11(4), 21–26. Prentice Hall.
Gay, L. R., Mills, G. E., & Airasian, P. (2009). Educational Waugh, K., & Gronlund, N. (2013). Assessment of stu-
research: Competencies for analysis and application dent achievement (10th ed.). Columbus, OH: Pearson
(9th ed.). Upper Saddle River, NJ: Pearson Education, Education.
Merrill/Prentice Hall. Wiggins, G. (1993). Assessment: Authenticity, context and
Huck, S. (2012). Reading statistics and research (6th ed.). validity. Phi Delta Kappan, 75(3), 200–214.
Boston, MA: Pearson, Allyn & Bacon. Young, J. R., (2011, August 12). Professors cede grading
Likert, R. (1932). A technique for the measurement of at- power to outsiders—even computers. The Chronicle of
titudes. Archives of Psychology, 2(140), 44–53, 55. Higher Education, 57, A1, 4–5.
Oosterhof, A. (2009). Developing and using classroom as-
sessments (4th ed.). Upper Saddle River, NJ: Pearson
Education, Merrill.
Chapter 5 Exercises
Exercise 5.2: Questions for Thought and Discussion— that “Software can grade essays thanks to improve-
These questions may be used for small group or class ments in artificial-intelligence techniques. Software
discussion or may be subjects for individual or group ac- has no emotional biases, either, and . . . machines
tivities. Take part in these discussions in your in-person have proved more fair and balanced in grading than
class meeting, or use your instructor-provided online dis- humans” (p. 4). What might be positive and negative
cussion area or blog. implications of this assessment practice for education
a. Wiggins (1993) was an early proponent of so-called and training?
“authentic assessment.” He felt that traditional meth-
ods of student assessment (e.g., multiple-choice tests)
cannot measure complex intellectual skills required Exercise 5.3: Design Project Activities and Assessment
in real-life situations and that use of such assess- Criteria—As you prepare assessments for your product
ments tends to narrow the curriculum to skills that for this course, use the following criterion checklist to
can be tested in this way. He said, “We should be assess your work:
seeking a more robust and authentic construct of _____ 1. The assessment activity, criteria, and grading
‘understanding’ and a more rigorous validation of strategy in the assessment are closely matched to
tests against that construct. . . the aim of education the instructional design objectives in each case.
is to help the individual become a competent intel- _____ 2. Each instrument meets all essential criteria for
lectual performer, not a passive ‘selector’ of orthodox that type of assessment.
and prefabricated answers” (p. 202). Are there situa- _____ 3. An appropriate testing format has been selected
tions for which “traditional” assessments are helpful? that is accessible to the students.
Are there situations in which “authentic assessments” _____ 4. The instruments have been put into their testing
are the better choice? formats (e.g., online, software package).
b. Young (2011) reports that allowing computers to _____ 5. If possible, the assessment has been subjected
grade students’ written work (e.g., essays) is becom- to an expert review and/or field test with
ing more common in higher education. He notes students to increase validity.