Professional Documents
Culture Documents
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
National Council on Measurement in Education is collaborating with JSTOR to digitize, preserve and extend
access to Journal of Educational Measurement.
http://www.jstor.org
DetectingDifferentialItemFunctioningUsing Logistic
RegressionProcedures
HariharanSwaminathan
University of Massachusetts
and
H. Jane Rogers
Teachers College, Columbia University
362
individuals with the same ability but from different groups do not have the same
probabilityof success on the item (Hambleton & Swaminathan, 1985). It follows
that no DIF is present if the logistic regression curves for the two groups are the
same-that is, if fOl = 0o2and 01 = f012 If 11= 012but 0Ol - o02,the curves
are parallel but not coincident and hence uniform DIF may be inferred. If
oO = o02 but f1l - f12, the curves are not parallel and hence the presence of
nonuniformDIF may be inferred.
An alternative but equivalent formulation of the model (1) is
ez
=
( [1 + e] '(2)
where
The term Ogis the product of the two independent variables, g and 0. With the
coding given above, the parameter r2 corresponds to the group difference in
performance on the item, and r3 corresponds to the interaction between group
and ability. In terms of the parameters of the model in Equation 1,
72 = 001 - 002
and
13 = ll- :12.
363
Note that interaction terms between xk and g are not included. With this
formulation, the logistic model can be expressed as
p m-l
= +
log (1 p) f0 kxfl k + rg. (6)
In this case, r = log a where a is the log-odds ratio defined in Holland and
Thayer (1988). Testing the hypothesis that r = 0 is now equivalent to testing the
hypothesis that a = 1, the hypothesis tested by the Mantel-Haenszel procedure.
The Mantel-Haenszel procedurecan therefore be thought of as being based on
a logistic regression model where the ability variable is considered discrete and
where no interaction between the ability variable and the group variable is
specified. Thus, in the language of experimental design models, the logistic
regression model given by Equations 2 and 3 corresponds to an analysis of
covariance model while the Mantel-Haenszel model given by Equation 6 corre-
sponds to a randomized block design model. Despite the similarity between the
Mantel-Haenszel model and the logistic regressionmodel, it should be noted that
the Mantel-Haenszel statistic for testing the hypothesis Ho:a = 1 (or r = 0) was
derived using arguments different from those given in the next section.
where
Test of Hypothesis
The parameter r2 indicates the mean group difference in performance on the
item, and r3 indicates the interaction between group and ability. If r2 is nonzero
while T3is zero, we can infer uniform DIF. If r3 is nonzero, whether or not r2 is
zero, we can infer nonuniformDIF. The hypotheses of interest are therefore that
T2 = 0 and r3 = 0. These two hypotheses can be tested simultaneously as
Ho: Cr = 0
against
HA: CT7 O , (13)
where
0010
0 0 1 0
C=
0 0 0 1
The statistic for testing the joint hypothesis is
X2 = T'C'(C2C')-ICT' (14)
which has the x2 distribution with 2 degrees of freedom. When the test statistic
given by Equation 14 exceeds X:2, the hypothesis that there is no DIF is rejected.
The item can then be flagged for further study by subject matter specialists.
366
to give the Mantel-Haenszel procedurean adequate chance to detect the DIF and
to determine under what conditions, if any, the Mantel-Haenszel procedure is
sensitive to this type of DIF.
In addition to comparing the detection rates of the two procedures under the
various conditions, as described above, the power of each DIF detection proce-
dure was studied by carrying out twenty replications of one cell of the design (80
items/500 per group). This cell represented the longest test and largest sample
used and thus was the condition under which both procedures should have the
best chance of detecting both uniform and nonuniformDIF.
Results
The detection rates for the two procedures are presented in Table 1. For the
items with uniform DIF, the two procedures had very similar detection rates,
with the Mantel-Haenszel procedure having a very slight advantage. Both were
able to detect uniform DIF of this size with about 75% accuracy in samples of
250 per group and with 100%accuracy in samples of 500. For nonuniform DIF,
the picture was very different. The Mantel-Haenszel procedure was completely
unable to detect nonuniform DIF under any condition. The logistic regression
procedure detected nonuniform DIF with about 50% accuracy in small samples
and short tests and 75%accuracy in large samples and long tests.
In terms of false positives, the Mantel-Haenszel procedure performed some-
what better than the logistic regression procedure. With a significance level of
Table1
of theLogisticRegressionandMantel-Haenszel
Comparison Procedures
in the Detectionof UniformandNon-uniform
DIF
Uniform 4 3 3 4 4
40 Non-uniform 4 0 2 0 2
FalsePositives 32 0 0 0 1
Uniform 6 6 5 6 6
60 Non-uniform 6 0 3 0 5
FalsePositives 48 0 1 0 3
Uniform 8 6 6 8 8
80 Non-uniform 8 0 4 0 6
FalsePositives 64 1 2 0 1
367
Conclusion
The logistic regression model described in this paper is more general and
flexible than the model that underlies the Mantel-Haenszel procedure. This
generality, however, comes at some cost. Computations in the Mantel-Haenszel
procedure can be carried out quickly and inexpensively, whereas the logistic
regressionprocedureis iterative and thus more costly. (Our experience has shown
that the logistic regression procedure costs 3-4 times as much as the Mantel-
Haenszel procedure.)
A comparison between the logistic regression procedure and the Mantel-
Haenszel procedure showed that the logistic regression procedureis as powerful
as the Mantel-Haenszel procedurein detecting uniform DIF and more powerful
Table2
RelativePowerof theLogisticRegressionand
Mantel-Haenszel
Proceduresfor TestLengthof 80
andSampleSize of 500 OverTwentyReplications
Percentof Items
Flaggedas Biased
Uniform 160 96 94
Non-uniform 160 1 71
FalsePositives 1280 1 4
a= .01
368
References
Bennet,R. E., Rock,D. A., & Kaplan,B. A. (1987). SAT differentialitem performance
forninehandicapped groups.Journalof EducationalMeasurement,24, 41-55.
Bock,R. D. (1975). Multivariatestatisticalmethods.New York:McGraw-Hill.
Hambleton,R. K., & Rogers, H. J. (1989). Detectingpotentiallybiased test items:
Comparisonof IRT area and Mantel-Haenszelmethods.Applied Measurementin
Education,2(4), 313-334.
Hambleton,R. K., & Rovinelli,R. J. (1973). A FortranIV programfor generating
examineeresponsedatafromlogistictest models.BehavioralScience,17, 73-74.
Hambleton,R. K., & Swaminathan,H. (1985). Item responsetheory. Boston,MA:
Kluwer-Nijhoff.
Holland, P. W., & Thayer, D. T. (1988). Differentialitem performanceand the
Mantel-Haenszelprocedure.In H. Wainer and H. I. Braun (Eds.), Test validity
(pp. 129-145). Hillsdale,NJ: LawrenceErlbaumAssociates.
Mellenberg,G. J. (1982). Contingencytable modelsfor assessingitem bias. Journalof
EducationalStatistics,7, 105-108.
Raju,N. S. (1988). The areabetweentwoitemcharacteristic curves.Psychometrika,53,
495-502.
Shepard,L., Camilli,G., & Averill,M. (1981). A comparisonof proceduresfordetecting
test-itembias with bothinternaland externalabilitycriteria.Journalof Educational
Statistics,6, 317-375.
Spray,J., & Carlson,J. (1986, April). Comparisonof loglinearand logisticregression
modelsfor detectingchangesin proportions. Paperpresentedat the annualmeetingof
the AmericanEducationalResearchAssociation,San Francisco.
369
Authors
HARIHARAN SWAMINATHAN is Professor, School of Education, University of
Massachusetts, Amherst, MA 01003. Degrees: BS, Dalhousie; MS, MEd, PhD, Univer-
sity of Toronto. Specializations: statistics, psychometrics, item response theory, and
evaluation.
H. JANE ROGERS is Assistant Professor, Teachers College, Columbia University, New
York, NY 10027. Degrees: BA, MEd, University of New England, Australia; PhD,
University of Massachusetts. Specializations: measurement, item response theory,
statistics, and evaluation.
370