You are on page 1of 12

Creating Valid and

Reliable Assessments
Tracy Salzer
April 2017
Types of Assessments

Formative Assessments Summative Assessments


Anecdotal records Final exams
Quizzes and essays Statewide tests (FCAT)
Diagnostic tests National tests
Lab reports Entrance exams (SAT and ACT)
What will you assess, and how will you
assess?

Target Area Example Target Behavior Possible Assessments


Knowledge Spell words correctly Quizzes, essays, questioning
Reasoning Solve math problems Essays, observations
Performance Speak foreign language Observations, rubrics
Product Development Create a web page Rubrics
Attitudes Positive attitudes Surveys, observations
The three building blocks of objectives:
Conditions

TheConditionsdefine the materials that will be available (or


unavailable) when the objective is assessed. It generally states
what the student will begivenornot given. Example conditions
for objectives might include:

Without the use of a calculator...


Given a map of Europe...
Given twelve double-digit numbers...
Behavior

TheBehavioris a verb that describes an observable activity --


what the student will do. The behavior is generally stated as an
action verb, such as: solve, compare, list, explain, evaluate,
identify, define.
Criterion

TheCriterion(also referred to as Degree) is the standard that is


used to measure whether or not the objective has been achieved.
The criteria might be stated as a percentage (80% correct), a time
limit (within five minutes), or another measure of mastery. For
example, an objective might be "Given a list of twenty states
(condition), the student will identify (behavior) at least fifteen of
the corresponding state capitals (criteria)."
Reliability

Reliabilityrefers to the extent to which assessments are consistent. Another


way to think of reliability is to imagine a kitchen scale. If you weigh five
pounds in the morning, and the scale is reliable, the same scale should register
five pounds an hour later. Likewise, instruments such as classroomtestsand
national standardized exams should be reliable it should not make any
difference whether a student takes the assessment in the morning or
afternoon; one day or the next.

Another measure of reliability is the internal consistency of the items. If you


create a quiz to measure students ability to solve addition facts, you should
be able to assume that if a student gets an item correct, he or she will also get
other, similar items correct.
Measures of Reliability

Type of Reliability How to Measure

Give the same assessment twice, separated by days,


Stability or Test-Retest weeks, or months. Reliability is stated as the correlation
between scores at Time 1 and Time 2.

Create two forms of the same test (vary the items


Alternate Form slightly). Reliability is stated as correlation between
scores of Test 1 and Test 2.

Internal Consistency Compare one half of the test to the other half.
Validity

Validityrefers to the accuracy of an assessment -- whether or not it measures


what it is supposed to measure. Even if a test is reliable, it may not provide a
valid measure. Lets imagine a scale that consistently tells you that an item
weighs 10 pounds. The reliability (consistency) of this scale is very good, but
it is not accurate (valid) because the object actually weighs 15 pounds.
Since teachers, parents, and school districts make decisions about students
based on assessments (such as grades, promotions, and graduation), the
validity inferred from the assessments is essential -- even more crucial than
the reliability. Also, if a test is valid, it is almost always reliable.

There are three ways in which validity can be measured. In order to have
confidence that a test is valid, all three kinds of validity evidence should be
considered.
Measures of Validity

Type of Validity Definition Example/Non-Example

Asemester or quarter exam that only


includes content covered during the last
The extent to which the content of the
Content six weeks is not a valid measure of the
test matches the instructional objectives.
course's overall objectives-- it has very
low content validity.

The extent to which scores on the test are If the end-of-year math tests in 4th grade
in agreement with (concurrent validity) or correlate highly with the statewide math
Criterion
predict (predictive validity) an external tests, they would have high concurrent
criterion. validity.

If you can correctly hypothesize that ESOL


The extent to which an assessment students will perform differently on a
Construct corresponds to other variables, as reading test than English-speaking
predicted by some rationale or theory. students (because of theory), the
assessment may have construct validity.
Bibliography

Presentation information from:


Classroom Assessment | Basic Concepts. N.p., n.d. Web. 15
Apr. 2017.
<https://fcit.usf.edu/assessment/basic/basicasi.html>.

You might also like