Professional Documents
Culture Documents
Czech republic
26-31, March, 2012
Different forms of a test
Item banking
Achievement monitoring
Classical Test Theory Item ResponseTheory
It is applied only for
different test forms
equating
It is often ignored
(conception of parallel test
forms)
Establishes equivalent
scores on different test
forms
Doesnt create a common
scale
Allows to satisfy all
equating needs
Allows to put all estimates
of item and examinee
parameters to the common
scale
It is a special procedure that allows to establish
relation between examinee scores on different
test forms and place them onto the same scale.
As a result, measure based on responses to one
test can be matched to a measure based on
responses to another test, and the conclusions
drawn about examinee are identical, regardless
of the test form that produced the measure.
Equating of different test forms is called
horizontal equating.
The purpose: comparison of student
achievements at different grade levels
Test forms are designed to be of different
difficulties
Measures from different tests should be
placed on the same linear continuum
Procedure of this test equating is called
vertical equating.
Item bank a set of items from which test forms
that create equivalent measures may be
constructed.
Item bank is composed of a set of test items that
have been placed onto a common scale, so that
different subsets of these items produce
interchangeable measures for an examinee.
In the presence of item bank we dont need in
further equating
Both are designed to place estimated parameters
onto a common scale
In test equating the goal is to place person
measures from the multiple test forms onto the
same scale
In item banking the goal is to place item
calibrations on the same scale
Procedures are nearly identical when we use Rasch
measurement
Equating procedure that ensures the
examinee measures obtained from different
subsets of items are interchangeable. When
two tests are equated, the resulting measures
are placed onto the same scale.
Scaling procedure that associates numbers
with the performance of examinees. Tests can
be scaled identically, but have not been
equated.
Applies only to compare examinee test scores on
two different test forms
A problem can be ignored (introduction of
parallel test froms)
Implies only an establishment of relation between
test scores on different test forms
Doesnt imply creation of a common scale
Linear equating
Equipercentile equating
It is based on equating the standard score on
test X to the standard score on test Y:
Thus, , where
,
B x A y
x
y
A
x y B
x
y
y x
y y x x
Scores on
tests X and
Y are
considered
to be
equivalent if
their
respective
percentile
ranks in any
given group
are equal.
Both methods require assumptions
concerning identity of test score
destrubutions and about equivalence of
examinee groups
Equating in CTT doesnt imply creation of a
common scale
Measuring the same trait tests of different
content can not be equated (but can be scaled in a
similar manner).
Invariance of equating results across samples of
examinees
Independence of equating results on which test is
used as a reference test
Method of common items: linkage between two
test forms is accomplished by means of a set of
items which are common for two test forms
Method of common persons: linkage between
two test forms is accomplished by means of a
set of persons who respond to both test forms
Combined methods: linkage between two test
forms is accomplished by means of common
items and / or common persons plus common
raters
Internal anchor:
Each test form
has one set of
items that is
shared with
other forms and
another set of
items that is
unique to this
form
External anchor:
Each test form
has an additional
set of items, that
are not from
these test forms
Involving all examinees respond both test
forms.
There are two approaches to this design:
- same group/ same time
- same group/ different time
Linkage between
two test forms is
accomplished by
means of a set of
examinees who
respond to all
items.
Selecting an equating method
Parameter estimation
Transformation of parameters from
different test froms to the same scale
Evaluating the quality of the links between
test froms
Simultaneous calibration: all parameters are
estimated simultaneously in one run of the
estimation software. Data are automatically scaled
to the same scale.
Separate calibration: parameters are estimated for
each test form separately. That is, the data are
calibrated in multiple runs of the estimation
software.
Separate calibration may be more difficult to
accomplish because the test developer needs to
transform measures to a common scale
Separate calibration of all test forms with
transformating measures to the common scale
Simultaneous calibration of all test forms and
placing all measures on the common scale
Separate calibration of all test forms with
anchoring the difficulty values of the common
items and consecutive placing all parameters on
the common scale
As a rule this procedure is used with method of common
items that are called nodal items in this case
Each test form is calibrated separately. As a result for each
test form all estimates lie on the own scale. The only
difference between scales is in difference between origins of
the scales
This difference can be removed by means of calculating
location shift
It is desirable to have not less that 15-20 % nodal items
(some of them can be deleted from the link later).
Choice of a common scale
Selection of nodal items
Calibration of all test forms
Calculating equating constants
Link quality evaluation
Transformating all parameters onto a common scale
t
12
shift constant from test form 1 to test form 2;
i1
difficulty estimate of item i in test from 1;
i2
difficulty estimate of item i in test from 2;
l the number of common items.
Sometimes other formulas are applied - weighted
mean, dispersion shift, etc.
l
t
l
i
i i
1
1 2
12
) (
i1
'
=
i1
+ t
12
,
where
i1
difficulty estimate for item i in test form 1;
i1
'
difficulty estimate for the same item on the scale of test form
2, i=1,,k, k the total number of test items;
n1
'
=
n1
+ t
12
,
where
n1
ability estimate for examinee n who respond items of
test form 1;
n1
'
ability estimate for the same examinee on the
scale of test form 2, n=1,, N; N
the total number of
examinees who respond items of test form 1.
Shifted by this way parameter estimates of test from 1 will be
placed to the scale of test form 2.
Item-within-link (fit analysis of linking items);
Item-between-link (stability of the item
calibrations between two test forms)
where
i12
is defined by
i12
2
=
i1
2
+
i2
2
;
i1
,
i2
- standard errors of measurement for item i under
calibration of test form 1 and 2;
i1
- difficulty estimate for item i in test form 1;
i1
'
- difficulty estimate for the same item on the scale of
test form 2;
U
i
~ N(0,1)
12
1 1
i
i i
i
U
i
Standar
d Error
i
Difficu
lty
estimat
e
i
Standard
Error
i
Shifted
Difficul
ty
estimate
i
'
4 -1.39 0.09 -1.07 0.09 -1.368 -0.17
6 -0.93 0.1 -0.54 0.09 -0.838 0.69
7 -2.57 0.1 -1.99 0.1 -2.288 2.0
14 -0.44 0.1 -0.32 0.09 -0.618 -1.33
20 0.88 0.12 0.96 0.11 0.662 -1.34
Sum -4.45 -2.96 -4.45
Mean -0.89 -0.592 -0.89
Shift constant t
12
= - 0,298.
It implies creation of a common response matrix for both test forms
containing 1315 examinees and 46 different items.
Measures of all examinees and difficulty values of all items will be
placed on a common scale that is centered in the difficulty mean of
all 46 items
Calibration of test form 1
Calibration of test form 2 with fixing the difficulty values of anchor
items from the first calibration
IAFILE=*
4 -1.39
6 -0.93
7 -2.57
14 -0.44
20 0.88
*
As a result examinee measures from both test forms will be on the
first test form scale
Comparison of examinee measures from three
equating procedures revealed approximately
similar results: correlation is closed to 1
The choice of equating procedure is determined by
the real data design and purpose of research