You are on page 1of 6

Types of Scores in Assessment

Collect This Article


By L.G. Cohen|L.J. Spenciner Pearson Allyn Bacon Prentice Hall
Updated on Jul 20, 2010

There are many ways of reporting test performance. A variety of scores can be used when
interpreting students' test performance.

Raw Scores
The raw score is the number of items a student answers correctly without adjustment for
guessing. For example, if there are 15 problems on an arithmetic test, and a student answers 11
correctly, then the raw score is 11. Raw scores, however, do not provide us with enough
information to describe student performance.

Percentage Scores
A percentage score is the percent of test items answered correctly. These scores can be useful
when describing a student's performance on a teacher-made test or on a criterion-referenced test.
However, percentage scores have a major disadvantage: We have no way of comparing the
percentage correct on one test with the percentage correct on another test. Suppose a child earned
a score of 85 percent correct on one test and 55 percent correct on another test. The interpretation
of the score is related to the difficulty level of the test items on each test. Because each test has a
different or unique level of difficulty, we have no common way to interpret these scores; there is
no frame of reference.
To interpret raw scores and percentage-correct scores, it is necessary to change the raw or
percentage score to a different type of score in order to make comparisons. Evaluators rarely use
raw scores and percentage-correct scores when interpreting performance because it is difficult to
compare one student's scores on several tests or the performance of several students on several
tests.

Derived Scores
Derived scores are a family of scores that allow us to make comparisons between test scores.
Raw scores are transformed to derived scores. Developmental scores and scores of relative
standing are two types of derived scores. Scores of relative standing include percentiles, standard
scores, and stanines.
Developmental Scores
Sometimes called age and grade equivalents, developmental scores are scores that have been
transformed from raw scores and reflect the average performance at age and grade levels. Thus,
the student's raw score (number of items correct) is the same as the average raw score for
students of a specific age or grade. Age equivalents are written with a hyphen between years and
months (e.g., 124 means that the age equivalent is 12 years, 4 months old). A decimal point is
used between the grade and month in grade equivalents (e.g., 1.2 is the first grade, second
month).
Developmental scores can be useful (McLean, Bailey, & Wolery, 1996; Sattler, 2001). Parents
and professionals easily interpret them and place the performance of students within a context.
Because of the ease of misinterpretation of these scores, parents and professionals should
approach them with extreme caution. There are a number of reasons for criticizing these scores.
For a student who is 6 years old and in the first grade, grade and age equivalents presume that for
each month of first grade an equal amount of learning occurs. But, from our knowledge of child
growth and development and theories about learning, we know that neither growth nor learning
occurs in equal monthly intervals. Age and grade equivalents do not take into consideration the
variation in individual growth and learning.
Teachers should not expect that students will gain a grade equivalent or age equivalent of one
year for each year that they are in school. For example, suppose a child earned a grade equivalent
of 1.5, first grade, fifth month, at the end of first grade. To assume that at the end of second
grade the child should obtain a grade equivalent of 2.5, second grade, fifth month, is not good
practice. This assumption is incorrect for two reasons: (1) The grade and age equivalent norms
should not be confused with performance standards, and (2) a gain of 1.0 grade equivalent is
representative only of students who are in the average range for their grade. Students who are
above average will gain more than 1.0 grade equivalent a year, and students who are below
average will progress less than 1.0 grade equivalent a year (Gronlund & Linn, 1990).
A second criticism of developmental scores is the underlying idea that because two students
obtain the same score on a test they are comparable and will display the same thinking, behavior,
and skill patterns. For example, a student who is in second grade earned a grade equivalent score
of 4.6 on a test of reading achievement. This does not mean that the second grader understands
the reading process as it is taught in the fourth grade. Rather, this student just performed at a
superior level for a student who is in second grade. It is incorrect to compare the second grader
to a child who is in fourth grade; the comparison should be made to other students who are in
second grade (Sattler, 2001).
A third criticism of developmental scores is that age and grade equivalents encourage the use of
false standards. A second-grade teacher should not expect all students in the class to perform at
the second-grade level on a reading test. Differences between students within a grade mean that
the range of achievement actually spans several grades. In addition, developmental scores are
calculated so that half of the scores fall below the median and half fall above the median. Age
and grade equivalents are not standards of performance.
A fourth criticism of age and grade equivalents is that they promote typological thinking. The
use of age and grade equivalents causes us to think in terms of a typical kindergartener or a
typical 10-year-old. In reality, students vary in their abilities and levels of performance.
Developmental scores do not take these variations into account.
A fifth criticism is that most developmental scores are interpolated and extrapolated. A normed
test includes students of specific ages and gradesnot all ages and gradesin the norming
sample. Interpolation is the process of estimating the scores of students within the ages and
grades of the norming sample. Extrapolation is the process of estimating the performance of
students outside the ages and grades of the normative sample.
Developmental Quotient
A developmental quotient an estimate of the rate of development. If we know a student's
developmental age and chronological age, it is possible to calculate a developmental quotient.
For example, suppose a student's developmental age is 12 years (12 years 12 months in a year =
144 months) and the chronological age is also 12 years, or 144 months. Using the following
formula, we arrive at a developmental quotient of 100.
Developmental age 144 months / Chronological age 144 months X 100 = 100
144/144 X 100 = 100
1/1 X 100 = 100
But, suppose another student's chronological age is also 144 months and that the developmental
age is 108 months. Using the formula, this student would have a developmental quotient of 75.
Developmental age 108 months/ Chronological age X 100 = 75
108/144 X 100 = 75
Developmental quotients have all of the drawbacks associated with age and grade equivalents. In
addition, they may be misleading because developmental age may not keep pace with
chronological age as the individual gets older. Consequently, the gap between developmental age
and chronological age becomes larger as the student gets older.
Scores of Relative Standing
Percentile Ranks A percentile rank is the point in a distribution at or below which the scores of
a given percentage of students fall. Percentiles provide information about the relative standing of
students when compared with the standardization sample. Look at the following test scores and
their corresponding percentile ranks.
Percentile
Student Score
Rank
Delia 96 84
Jana 93 81
Pete 90 79
Marcus 86 75
Jana's score of 93 has a percentile rank of 81. This means that 81 percent of the students who
took the test scored 93 or lower. Said another way, Jana scored as well as or better than 81
percent of the students who took the test.
A percentile rank of 50 represents average performance. In a normal distribution, both the mean
and the median fall at the 50th percentile. Half the students fall above the 50th percentile and
half fall below. Percentiles can be divided into quartiles. A quartile contains 25 percentiles or 25
percent of the scores in a distribution. The 25th and the 75th percentiles are the first and the third
quartiles. In addition, percentiles can be divided into groups of 10 known as deciles. A decile
contains 10 percentiles. Beginning at the bottom of a group of students, the first 10 percent are
known as the first decile, the second 10 percent are known as the second decile, and so on.
The position of percentiles in a normal curve is shown in Figure 4.5. Despite their ease of
interpretation, percentiles have several problems. First, the intervals they represent are unequal,
especially at the lower and upper ends of the distribution. A difference of a few percentile points
at the extreme ends of the distribution is more serious than a difference of a few points in the
middle of the distribution. Second, percentiles do not apply to mathematical calculations
(Gronlund & Linn, 1990). Last, percentile scores are reported in one-hundredths. But, because of
errors associated with measurement, they are only accurate to the nearest 0.06 (six one-
hundredths) (Rudner, Conoley, & Plake, 1989). These limitations require the use of caution
when interpreting percentile ranks. Confidence intervals, which are discussed later in this
chapter, are useful when interpreting percentile scores.
Standard Scores Another type of derived score is a standard score. Standard score is the
name given to a group or category of scores. Each specific type of standard score within this
group has the same mean and the same standard deviation. Because each type of standard score
has the same mean and the same standard deviation, standard scores are an excellent way of
representing a child's performance. Standard scores allow us to compare a child's performance on
several tests and to compare one child's performance to the performance of other students.
Unlike percentile scores, standard scores function in mathematical operations. For instance,
standard scores can be averaged. In the Snapshot, teachers Lincoln Bates and Sari Andrews
discuss test scores. As is apparent, standard scores are equal interval scores. The different types
of standard scores, some of which we discuss in the following subsections, are:

1. z-scores: have a mean of 0 and a standard deviation of 1.


2. T-scores: have a mean of 50 and a standard deviation of 10.
3. Deviation IQ scores: have a mean of 100 and a standard deviation of 15 or 16.
4. Normal curve equivalents: have a mean of 50 and a standard deviation of 21.06.
5. Stanines: standard score bands divide a distribution of scores into nine parts.
6. Percentile ranks: point in a distribution at or below which the scores of a given percentage of
students fall.

Deviation IQ Scores Deviation Deviation IQ scores are frequently used to report the
performance of students on norm-referenced standardized tests. The deviation scores of
theWechsler Intelligence Scale for ChildrenIII and the Wechsler Individual Achievement Test
IIhave a mean of 100 and a standard deviation of 15, while the Stanford-Binet Intelligence
ScaleIVhas a mean of 100 and a standard deviation of 16. Many test manuals provide tables that
allow conversion of raw scores to deviation IQ scores.
Normal Curve Equivalents Normal curve equivalents (NCEs) a type of standard score with a
mean of 50 and a standard deviation of 21.06. When the baseline of the normal curve is divided
into 99 equal units, the percentile ranks of 1, 50, and 99 are the same as NCE units (Lyman,
1986). One test that does report NCEs is the Developmental Inventory-2.However, NCEs are not
reported for some tests.
Stanines Stanines are bands of standard scores that have a mean of 5 and a standard deviation of
2. Stanines range from 1 to 9. Despite their relative ease of interpretation, stanines have several
disadvantages. A change in just a few raw score points can move a student from one stanine to
another. Also, because stanines are a general way of interpreting test performance, caution is
necessary when making classification and placement decisions. As an aid in interpreting
stanines, evaluators can assign descriptors to each of the 9 values:
9very superior
8superior
7very good
6good
5average
4below average
3considerably below average
2poor
1very poor

Basal and Ceiling Levels


Many tests, because test authors construct them for students of differing abilities, contain more
items than are necessary. To determine the starting and stopping points for administering a test,
test authors designate basal and ceiling levels. (Although these are really not types of scores,
basal and ceiling levels are sometimes called rules or scores.) The basal level is the point below
which the examiner assumes that the student could obtain all correct responses and, therefore, it
is the point at which the examiner begins testing.
The test manual will designate the point at which testing should begin. For example, a test
manual states, "Students who are 13 years old should begin with item 12. Continue testing when
three items in a row have been answered correctly. If three items in a row are not answered
correctly, the examiner should drop back a level." This is the basal level.
Let's look at the example of the student who is 9 years old. Although the examiner begins testing
at the 9-year-old level, the student fails to answer correctly three in a row. Thus, the examiner is
unable to establish a basal level at the suggested beginning point. Many manuals instruct the
examiner to continue testing backward, dropping back one item at a time, until the student
correctly answers three items. Some test manuals instruct examiners to drop back an entire level,
for instance, to age 8, and begin testing. When computing the student's raw score, the examiner
includes items below the basal point as items answered correctly. Thus, the raw score includes
all the items the student answered correctly plus the test items below the basal point. The ceiling
level is the point above which the examiner assumes that the student would obtain all incorrect
responses if the testing were to continue; it is, therefore, the point at which the examiner stops
testing. "To determine a ceiling," a manual may read, "discontinue testing when three items in a
row have been missed."
A false ceiling can be reached if the examiner does not carefully follow directions for
determining the ceiling level. Some tests require students to complete a page of test items to
establish the ceiling level.

You might also like