You are on page 1of 53

POSTER

TITLE
Differential Item Functioning (DIF) on the IPIP Neuroticism Scale
ABSTRACT
We evaluated the Neuroticism scale of the International Personality Item Pool (IPIP) for
evidence of age- and gender-based differential item functioning (DIF) using NCDIF, CDIF and
DFIT indices in a sample of 23,994 respondents. Results showed scale-mean differences and
significant item-level DIF for 10% of the items.
PRESS PARAGRAPH
The psychological trait of Neuroticism has been shown to be correlated with job performance
and other outcomes in many past studies. We evaluated the Neuroticism scale of the
International Personality Item Pool (IPIP) for evidence of age- and gender-based differential item
functioning (DIF) in a sample of 23,994 respondents. Results showed scale-mean differences and
significant item-level DIF for a sizable number of the items for both age and gender. Our
findings underscore the importance of examining both commercial and public-domain
personality inventories to determine the degree to which their items perform consistently across
demographic subgroups.

Differential Item Functioning (DIF) on the


IPIP Neuroticism Scale
For decades, findings of group-mean differences between demographic subgroups
of respondents on psychological test scales have led people to wonder if the scales that
exhibit such differences indeed measure the underlying trait accurately, or if instead they
may be biased against the lower-scoring subgroup (e.g., Cleary, 1968; Lord, 1977; Holland
& Thayer, 1988). In recent years, the increased use of item response theory (IRT) methods
to examine the question of test fairness (e.g., Stark, Chernyshenko, & Drasgow, 2004;
Swaminathan & Rogers, 1990; Thissen, Steinberg, & Wainer, 1993) has led to the general
acceptance of the view that one cannot necessarily assume from the mere presence of mean
differences that a scale is biased against the lower-scoring subgroup. However, one can
still argue that if such scales are used in applied settings to make decisions regarding
employee selection, college admission, or similar purposes that have a significant impact
on the selected (and non-selected) individuals, it is essential to further investigate the
extent to which and ideally, the reasons why subgroup mean differences are present at
the scale level.
Although the debate over potential psychological test bias has tended to focus
heavily on constructs in the cognitive domain (e.g., Cleary, 1968; Linn, Levine, Hastings,
& Wardrop, 1981; Lord, 1977) due to widespread reports of nontrivial scale-level mean
differences between ethnic subgroups, studies have begun to explore the question of bias in
non-cognitive domains, particularly personality (e.g. Forrest, Lewis, & Shevlin, 2000;
Huang, Church, & Katigbak, 1997; Reise, Smith, & Furr, 2001; Robie, Zickar, & Schmit,
2001; Smith, 2002). Of particular relevance to the present study, reports of the existence of

gender-based subgroup differences in measures of Neuroticism have led researchers to


examine such measures for potential bias in their items (e.g., Reise, Smith, & Furr, 2001;
Smith & Reise, 1999). Given that measures of Neuroticism have been found to negatively
correlate with job satisfaction (Furnhman & Zacherl, 1986), management skills (Furnham
& Mitchell, 1991), symptoms of job pressure and dissatisfaction (Kirkaldy, Thome &
Thomas, 1989), and to positively correlate with work-family conflict (Rantanen,
Pulkkinen, & Kinnunen, 2005) as well as predict aspects of job burnout (Godderd, Patton,
& Creed, 2004; Zellars, Hochwarter, & Perrew, 2004), we argue that it is important to
further study the question of whether the items or scales used in such assessments may
function differently for different demographic subgroups of individuals.
This study extended prior research by searching for differences in subgroup
functioning on the emotional stability/Neuroticism scales contained in a widely-used, FiveFactor model personality measure that has not previously received such scrutiny: the
International Personality Item Pool (IPIP; e.g., Goldberg, 2000). We used the DFIT
framework proposed by Raju, van der Linden, & Fleer (1995), collecting evidence of
subgroup differential functioning at both the test- and item level of analysis. If differential
functioning is seen at significant levels at either the item or test level, such a finding would
have important implications for practitioners and researchers who use the IPIP. That is, the
IPIP is a public-domain, international scientific effort, supported by the efforts of many
scientists dedicated to the continuous improvement of the measurement of human
personality. This research is used for further development and refinement of the IPIP (e.g.,
Goldberg, Johnson, Eber, Hogan, Ashton, Cloninger, & Gough, 2006). If it were to be
found that an appreciable number of IPIP items display DIF, or that the DIF tends to

operate to consistently favor one demographic subgroup versus another, such a finding
should form the basis for subsequent investigations into the critical question of why such
items perform differently in subgroups of respondents.
By employing the DFIT framework, we were able to explore the bottom-line effect
of the removal of any items displaying DIF on the overall level of differential test
functioning (DTF). In the IRT-based approach to assessing test fairness, when mean
scores of two population subgroups differ on a measure, there are two primary types of
explanations. The first is that the test accurately assesses the level of ability in the different
subgroups, but that the two groups truly differ in their mean levels of ability. The other is
that the test is not accurately measuring ability for one of the groups, consistently overestimating in one and under-estimating in the other, due to the fact that the individual test
items perform differently (in the sense of how they relate to the underlying trait being
measured) depending on the group in question.
Based on past research showing gender-based mean differences on various scales
measuring Neuroticism, including the IPIP, we predicted that our sample of IPIP
respondents would display gender differences at the scale-score level. Specifically:
Hypothesis 1: Mean gender differences will be found on the Neuroticism scale
scores with women scoring higher than men.
We focused on the version of the IPIP (the IPIP-NEO) that is designed to parallel
the constructs and sub-facets of Neuroticism identified by the NEO-PI-R. Accordingly, we
expected to find results that paralleled those reported by Reise, Smith, and Furr (2001) for
the actual NEO-PI-R. That is, using other instruments, Reise et al. and other authors have
examined differential item functioning in scales measuring personality constructs in the

Neuroticism domain, finding differential item functioning in items that are similar to those
used to estimate these constructs in the IPIP-NEO. Hence, we predicted that at least some
of the items on the Neuroticism scale of the IPIP-NEO would exhibit DIF. As with the
NEO-PI-R, the IPIP-NEO is likely to contain items that are more likely for women to
endorse, as well as items that are more likely for men to endorse. These items may tend to
cancel each other out, leading to no finding of cumulative differential functioning at the
test level. Thus, it is hypothesized that:
Hypothesis 2a: The Neuroticism scale of the IPIP will demonstrate significant
differential functioning at the item level.
Hypothesis 2b: The Neuroticism scale of the IPIP will not demonstrate significant
differential functioning at the scale level.
Based on the prior research described above, we further expected that the greatest
number of items displaying DIF would occur in the Anxiety facet. This consistent with
Reise et al. (2001), and also with Shepperd (1997), who found the specific item I often
feel anxious displayed DIF. Accordingly, regarding the Anxiety facet of the IPIP-NEO:
Hypothesis 3: The Anxiety facet scale of the Neuroticism scale of the IPIP will
contain the greatest number of items containing DIF when compared to the other facets.
Hypothesis 3b: The Anxiety facet scale of the Neuroticism scale of the IPIP will
contain the greatest amount of differential item functioning when compared to the other
facet scales.
Reise et al. (2001) found that of the items displaying DIF that served to raise
womens scores on the Anxiety facet, two specifically addressed fear-related issues. It can
be argued from a gender-role theoretical standpoint that these items are easier for women

to endorse, as it is more socially acceptable and thus a part of their socially constructed
gender role for women to acknowledge and demonstrate fears. Indeed, research has
found gender differences in self-reported level of fear (e.g. Dillon, Wolf, & Katz, 1985),
and other studies specifically investigating confidence in expressing fear found that women
were significantly more confident in expressing their feelings of fear (e.g., Blier and BlierWilson, 1989), Therefore, the following hypothesis was examined:
Hypothesis 4: Fear items on the Anxiety facet scale will be easier for women to
endorse and will display DIF. Specifically, these items are fear for the worst and am
afraid of many things.
Relatively little research has examined personality measures for evidence of DIF on
the basis of age. However, given that scale-mean age differences have been reported for
Neuroticism, using the same rationale noted above with respect to gender, we further
hypothesized that:
Hypothesis 5a: At least some IPIP-NEO items will function differently for different
age-based subgroups.
Hypothesis 5b: At the scale level, any item-level DIF will cancel out so that no
sizable scale-level differences will be seen.
Method
Participants
Substantial numbers of respondents are required for DIF analysis. For example,
sample sizes less than 1,000 for both the referent and focal groups have been found to
impair the performance of NCDIF and CDIF (e.g., Searcy, 1998). For this reason, the
largest available sample of respondents was used for this study. Currently, this sample is a

dataset collected by John A. Johnson, consisting of over 20,000 respondents. For our
study, a sample of 23,994 respondents who anonymously completed the 300-item IPIP
measures via the internet between August 6, 1999 and March 19, 2000 was selected for
study. This dataset was previously examined for invalid response protocols, and protocols
were discarded if they showed evidence of duplicate responses, inattentive responding,
greater than average missing responses, and unacceptably low levels of consistency. This
resulted in a final sample of 20,933 respondents. Complete information on the method used
for eliminating protocols is available from Johnson (2005).
The sample consisted of 7743 (36.9%) males and 13249 (63.1%) females. All
individuals reported ages between the ages of 10 and 99. However, less than 1% of the
respondents reported ages over 58 years of age. This figure is consistent with what has
been found in other studies utilizing Internet samples (e.g., Gosling, Vazire, Srivastava, &
John, 2004). Due to the limited number of individuals over age 58 who provided data, and
prior research providing evidence that personality stabilizes in later years, individuals over
age 55 were not used in the analysis examining differential functioning by age group.
Instead, the sample was divided into groups as follows: under age 20 (n=7333) , ages 20 to
30 (n=8080), over 30 to 55 (n=5226).
Measure
The IPIP consists of 1,412 items developed by Lewis R. Goldberg, in conjunction
with researchers in the Netherlands and Germany (e.g., Goldberg, 2000). From these items,
scales have been developed to estimate the Big Five domains and the Five Factor Model
(FFM) domains (as measured by the NEO-PI-R), as well as a variety of other scales.
Research investigating the psychometric properties of scales developed from the IPIP has

been supportive and encouraging of their use (e.g., Lim & Ployhart, 2006; Guenole &
Chernyshekno, 2005). For this study, the items identified as estmating the FFM construct
of Neuroticism as measured by the subscales of the NEO-PI-R (Costa & McCrae, 1992)
were used. There are 60 items on this scale, with 10 items corresponding to each of the
subscales of the NEO-PI-R. On the IPIP, these subscales are labeled Anxiety, Anger
(labeled Anger Hostility on the NEO-PI-R), Depression, Self-Consciousness,
Immoderation (labeled Impulsiveness on the NEO-PI-R), and Vulnerability. The items of
the Neuroticism scale of the IPIP-NEO are listed in Table 1. The response options for all
items are 1=Very Inaccurate, 2 = Moderately Inaccurate, 3 = Neither Inaccurate nor
Accurate, 4 = Moderately Accurate, 5 = Very Accurate. Reflected items (i.e., agreeing with
them indicated lower Neuroticism) were reversed for data analysis.
Analyses
Estimation. To test for unidimensionality of the data, exploratory factor analytic
(EFA) procedures were employed, specifically principal components analysis. Both the
scree plot and the percent of variance accounted for by the first factor were examined.
Samejima's (1969) graded response model was used for item parameter estimation,
resulting in one discrimination parameter and four difficulty parameters for each item. The
computer program MULTILOG was used for parameter estimation. Parameters were
calibrated separately for the reference and focal group.

DIF analysis. Because we were interested in examining differential functioning at


both the item and overall test level for the polytomous measure of interest, the DFIT
framework (Raju, van der Linden, & Fleer, 1993) was used to assess differential item
functioning. DFIT provides additional advantages over other models in that it can detect
uniform and nonuniform DIF, and has been found to show decreased false positive rates
over other DIF detection methods (e.g., Bolt, 2002).
Prior to the DIF analysis, all parameters were equated so that the parameters for the
reference group would lie on the same scale as that for the focal group; this was
accomplished using the computer program EQUATE (Baker, 1993), which employs
Stocking & Lords (1993) iterative test characteristic curve procedure. This characteristic
curve method minimizes differences across groups in the score intervals in the
transformation of the scale, and has been shown to provide improved bias detection over
non-iterative linking methods in both dichotomous models (e.g., Candell & Drasgow,
1998) and polytomous models (e.g., Hildago-Montesinos & Lopez-Pina, 2002).
It is important to note that if an item is found to have DIF, it can be considered to
have been incorrectly used in the equating process. Therefore, after the DIF analyses were
performed, items found to have significant DIF were removed, the scales re-equated, and
the DIF analyses performed again. NCDIF was considered statistically significant with the
NCDIF index less than or equal to 0.096, and chi-square statistically significant at the .01
level. These levels of significance were also used for the DTF index. If the results
indicated that DTF was significant, a similar iterative process was used to remove the item
with the largest DIF and re-examine DTF until DTF was no longer significant
Results and Discussion

Mean Differences.
Mean differences and other descriptive statistics are provided in Table 2. Although
mean differences are seen with women scoring higher than men on the Neuroticism scale
this difference is not statistically significant (F=0.064, p=.801). However, these mean
differences are comparable to what has been found in other research. It is interesting to
note that mean gender-based differences are seen in every subscale of the Neuroticism
scale, with mean differences on four out of the six subscales being statistically significant.
The scales with statistically significant mean score differences are: Anger (F=16.061,
P<.05), Depression (F=5.653, P<.05), Immoderation (F=10.411, P<.05), and Vulnerability
(F=4.180, P<.05). Thus, Hypothesis 1 was supported.
Unidimensionality of Scale
Principal components analysis was used to assess the unidimensionality of the IPIP
Neuroticism scale, and the first factor accounted for 27.93% of the variance, meeting the
criterion of having greater than 20% of the variance be accounted for by the first latent
dimension. The scree plot revealed a clear separation of the first factor from the others,
with the ratio of the first factor to the second factor of approximately 4:1. Although one
might well argue that meaningful higher-dimensionality factor solutions could also be
extracted from these items, based on the results of past research (e.g., Reckase, 1979) we
considered the amount of variance accounted for by the first factor here be sufficient to
justify the use of this unidimensional IRT model to examine the properties of the
underlying general factor of Neuroticism.
Item Parameter Estimation, Equating, and DFIT

10

Item parameters were estimated using MULTILOG (Thissen, 1995). Parameters


were estimated separately for males, females, and each age group. The parameters, one a
(discrimination) and four b (difficulty) parameters, for the gender groups are provided in
Tables 3-4, and for the age groups in Tables 5-7. The Fortran-based program EQUATE
(Baker, 1991) provided the linear transformation coefficients used in the DFIT program to
transform the reference group parameters onto the metric group, allowing cross-group
comparisons. Because the equating process is not considered accurate if DIF items are used
in the equating process, the DIF analysis was performed after items were identified as
having DIF, and the scales re-equated with the DIF items taken out.
Being the larger group numerically, the females were considered the reference
group, and the males were designated the focal group. For the age comparison, the middle
age group was considered to be the reference group for both comparisons. Hence, the
middle age group was compared against the older group as the focal group, and again
against the younger group as the focal group. The DFIT5 program (Raju, 2003) was used
to estimate the NCDIF, CDIF and DFIT indexes. The number of theta estimates was
reduced to 3,000 as the DFIT program will allow only a maximum of 3000 to be used
(Flowers, personal communication).
Subgroup Comparisons
It was hypothesized that the Neuroticism scale of the IPIP would demonstrate
significant differential functioning by gender at the item level, and as the results in Table 8
indicate, this prediction is supported by the data: six items (i.e., 10% of the full IPIP-NEO
Neuroticism item pool) display significant NCDIF. Hypothesis 2b, that the Neuroticism
scale of the IPIP will not demonstrate significant differential functioning by gender at the

11

scale level, is also supported, with the DTF index falling below the cutoff value of 5.760
(DTF= 3.263, = 80909.71, p<.01). Although the Anxiety subscale was hypothesized to
have the greatest number and amount of differential item functioning, contrary to
predictions there are no items in that subscale that demonstrate significant NCDIF. Instead,
three items (half of those identified with NCDIF), are contained in the Depression subscale
(i.e., am often down in the dumps, feel desperate, and feel that my life lacks
direction). Thus, Hypotheses 3a and 3b did not receive support.
Hypothesis 4, which predicted that fear-related items on the Anxiety facet scale
should be easier for women to endorse (thus, being more likely to display DIF), was not
supported. Specifically, the items fear for the worst and am afraid of many things do
not exhibit NCDIF in this sample.
Hypothesis 5a, that the Neuroticism scale of the IPIP would demonstrate significant
differential functioning by age at the item level, was supported. In the mid/young group
comparison (see Table 9), NCDIF was displayed by two items (don't know why I do some
of the things I do and never spend more than I can afford). In the mid/older group
comparison (see Table 10), only one item displayed NCDIF (can't make up my mind).
Hypothesis 5b, that the Neuroticism scale of the IPIP will not demonstrate significant
differential functioning at the scale level, was also supported.
To illustrate the action of the differential item functioning based on gender and age
seen above, Figures 1-8 present boundary response functions for selected items (showing
the cumulative likelihood of a response in category k or above on the multipoint scale),
broken down by the subgroup comparison of interest; Figures 9-13 present category
response curves showing the performance of the various response options for each

12

subgroup. For example, for item 15 (am often down in the dumps), an examination of
the difficulty parameters shows that women (b1= -1.100,b2=-0.190, b3=.436, b4=1.42) are
less likely to endorse the item than men (b1=-1.376, b2=-0.506, b3=0.141, b4=1.164) having
similar overall theta scores (see Figures 4 and 9). A possible explanation for the finding of
DIF in this item could be the interpretation of down in the dumps, whereby this
idiomatic phrase could have been interpreted differently (and less desirably) by women
than men. Item 16 (find it difficult to approach others) was also endorsed less by women
(b1= -2.23,b2= -0.423, b3=.436, b4=2.44) than men (b1=-2.499, b2=-0.976, b3=-0.139,
b4=1.664), as seen in Figures 5 and 10. As evident by the CRC, the item displays greater
DIF at the moderately accurate and very accurate response categories.
In sum, with respect to our goal of determining the degree to which the DIF-related
issues that have been seen in other Neuroticism scales may also afflict the IPIP-NEO item
pool, the results above show that nontrivial differences are indeed seen on the basis of
gender, and to a lesser extent, age. Although we admittedly do not have a post-hoc, theorybased explanation that can account for the pattern of differential functioning on these
Neuroticism items, this finding should nevertheless serve as the stimulus for subsequent
research to probe this issue further, both to shed light on the causal dynamics of this
differential responding, and to determine whether the test-level impact of such differences
plays any practically significant role in real-world applications in which such scales are
used in selection situations.

13

References
Baker, F. B. (1995). EQUATE 2.1: Computer Program for Equating Two Metrics
in Item Response Theory [Computer program]. Madison: University of Wisconsin,
Laboratory of Experimental Design.
Blier, M. J. & Blier-Wilson, L. A. (1989). Gender differences in self-rated
emotional expressiveness. Sex Roles, 21, 287-295.
Bolt, D. (2002). A Monte Carlo comparison of parametric and nonparametric
polytomous DIF detection methods. Applied Measurement in Education, 15, 113-141.
Cleary, T.A. (1968). Test bias: Prediction of grades of negro and white students in
integrated colleges. Journal of Educational Measurement, 5, 115-124.
Costa, P.T. & McCrae, R. R. (1992). Four ways the big five are basic. Personality
and Individual Differences, 13, 653-665.
Dillon, K. M., Wolf, E., & Katz, H. (1985). Sex roles, gender, and fear. The
Journal of Psychology, 119, 355-359.
Forrest, S., Lewis, C.A., & Shevlin, M. (2000). Examining the factor structure and
differential functioning of the Eysenck Personality Questionnaire revised-abbreviated.
Personality and Individual Differences, 29, 579-588.
Furnham, A. & Mitchell, J. (1991). Personality, needs, social skills, and academic
achievement: A longitudinal study. Personality and Individual Differences, 12, 1067-1073.
Furnham, A. & Zacherl, M. (1986). Personality and job satisfaction. Personality
and Individual Differences, 7, 453-459.

14

Godderd, R., Patton, W., Creed, P. (2004). The importance and place of
Neuroticism in predicting burnout in employment service case managers. Journal of
Applied Social Psychology, 34, 282-296
Goldberg, L.R. (2000). International Personality Item Pool: A Scientific
Collaboratory for the Development of Advanced Measures of Personality and Other
Individual Differences [online]. Available: http://ipip.ori.org/ipip/.
Goldberg, L.R., Johnson, J.A., Eber, H.W., Hogan, R., Ashton, M.C., Cloninger, C.
R., Gough, H. G. (2006). The international personality item pool and the future of publicdomain personality measures. Journal of Research in Personality, 40, 84-96.
Gosling, S. D., Vazire, S., Srivastava, S., & John, O.P. (2004). Should we trust
web-based studies? American Psychologist, 59, 93-104.
Guenole, N. & Chernyshenko, O. S. (2005). The suitability of Goldbergs big five
IPIP personality markers in New Zealand: A dimensionality, bias, and criterion validity
evaluation. New Zealand Journal of Psychology, 34, 86-96.
Hildago-Montesinos, M.D. & Lopez-Pina, J. A. (2002) Two-stage equating in
differential item functioning detection under the graded response model with the Raju area
measures and the Lord statistic. Educational and Psychological Measurement, 62, 32-44.
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the
Mantel-Haenzel procedure. In H. Wainer & H. Braun (Eds.) Test Validity (pp. 129-145).
Hillsdale, NJ: Lawrence Erlbaum Associates.
Huang, C., Church, A.T., & Katigbak, M.S. (1997). Identifying cultural differences
in items and traits: Differential function in the NEO Personality Inventory, Journal of
Cross-Cultural Psychology, 28, 192-218.

15

Kirkaldy, B., Thome, E., & Thomas, W. (1989). Job satisfaction among
psychosocial workers. Personality and Individual Differences, 10, 191-196.
Lim, B. & Ployhart, R. E. (2006). Assessing the convergent and discriminant
validity of Goldberys international personality item pool: a multitrait-multimethod
examination. Organizational Research Methods, 9, 29-54.
Linn, R. L., Levine, M.V., Hastings, C. N., & Wardrop, J. L. (1981). Item bias in a
test of reading comprehension. Applied Psychological Measurement, 5, 159-173.
Lord, F.M. (1977). A study of item bias using item characteristic theory. In Y.H.
Pooringa (Ed.), Basic problems in cross-cultural research (pp. 19-29). Amsterdam: Swets
& Zeilinger.
Lord, F. M. (1980). Applications of item response theory to practical testing
problems. Hillsdale, NJ: Lawrence Erlbaum.
Raju, N., van der Linden, W., & Fleer, P. (1995). IRT-based internal measures of
differential functioning of items and tests. Applied Psychological Measurement, 19,
353-368.
Rantanen, J. Pulkkinen, L. & Kinnunen, U. (2005). The big-five personality
dimensions, work-family Journal of Individual Differences, 26conflict, and psychological
distress: A longitudinal view., 155-166.
Reckase, M. D. (1979). Unifactor Latent Trait Models Applied to Multi-Factor
Tests: Results and Implications. Journal of Educational Statistics, 4, 207-230.
Reise, S.P. & Hansen, J. M. (2003). A discussion of modern versus traditional
psychometrics, as applied to personality assessment scales. Journal of Personality
Assessment, 81, 93-103.

16

Reise, S.P., Smith, L., & Furr, R.M. (2001). Invariance on the NEO-PI-R
Neuroticism scale. Multivariate Behavioral Research, 36, 83-110.
Robie, C., Zickar, M., & Schmit, M.K.J., (2001). Measurement equivalence
between applicant and incumbent groups: An IRT analysis of personality scales. Human
Performance, 14, 187-207.
Samejima, F. (1969). Estimation of Latent Ability Using a Response Pattern of
Graded Scores. Psychometrika Monograph Supplement, 34 (4, Pt. 2).
Searcy, C. A. (1998). A Monte-Carlo investigation of the DFIT framework applied
to polytomous data under the graded response model. Unpublished doctoral dissertation,
University of Georgia, Athens, GA.
Shepperd Jr, R.L. (1997). Differential Item Functioning in the Hogan Personality
Inventory. Unpublished dissertation, Central Michigan University.
Smith, L.L. (2002). On the usefulness of item bias analysis to personality
psychology. Personality and Social Psychology Bulletin, 28, 754-763.
Smith, L. L. & Reise, S.P. (1998). Gender differences on negative affectivity; An
IRT study of differential item functioning on the multidimensional personality
questionnaire stress reaction scale. Journal of Personality and Social Psychology, 75,
1350-1362.
Stark, S., Chernyshenko, O.S., & Drasgow, F. (2004). Examining the effects of
differential item (functioning and differential) test functioning on selection decisions:
When are statistically significant effects practically important? Journal of Applied
Psychology, 89, 497508.

17

Swamination, H. & Rogers, H. J. (1990). Detecting differential item functioning


using logistic regression procedures. Journal of Educational Measurement, 27, 361-370.
Thissen, D., Steinberg, L., Wainer, H. (1993). Detection of differential item
functioning using the parameters of item response models. In Holland, Paul W. & Wainer,
H. (Eds). Differential item functioning. (pp. 337-347). Hillsdale, NJ, US: Lawrence
Erlbaum Associates.
Zickar, M. J. (2002). Modeling data with polytomous item response theory. In N.
Schmitt & F. Drasgow (Eds.), New Advances in Psychometric Methods. Jossey-Bass. San
Francisco, CA.

18

Table 1
Neuroticism Items
N1: ANXIETY
1. Worry about things.
7. Fear for the worst.
13. Am afraid of many things.
19. Get stressed out easily.
25. Get caught up in my problems.
31. Am not easily bothered by things.
37. Am relaxed most of the time.
43. Am not easily disturbed by events.
49. Don't worry about things that have already happened.
54. Adapt easily to new situations.
N2: ANGER
2. Get angry easily.
8. Get irritated easily.
14. Get upset easily.
20. Am often in a bad mood.
26. Lose my temper.
32. Rarely get irritated.
38. Seldom get mad.
44. Am not easily annoyed.
50. Keep my cool.
56. Rarely complain.
N3: DEPRESSION

19

3. Often feel blue.


9. Dislike myself.
15. Am often down in the dumps.
21. Have a low opinion of myself.
27. Have frequent mood swings.
33. Feel desperate.
39 .Feel that my life lacks direction.
45. Seldom feel blue.
51. Feel comfortable with myself.
57. Am very pleased with myself.
N4: SELF-CONSCIOUSNESS
4. Am easily intimidated.
10. Am afraid that I will do the wrong thing.
16. Find it difficult to approach others.
22. Am afraid to draw attention to myself.
28. Only feel comfortable with friends.
34. Stumble over my words.
40. Am not embarrassed easily.
46. Am comfortable in unfamiliar situations.
52. Am not bothered by difficult social situations.
58. Am able to stand up for myself.
N5: IMMODERATION
5. Often eat too much.
11. Don't know why I do some of the things I do.
17. Do things I later regret.
23. Go on binges.

20

29. Love to eat.


35. Rarely overindulge.
41. Easily resist temptations.
47. Am able to control my cravings.
53. Never spend more than I can afford.
59. Never splurge.
N6: VULNERABILITY
6. Panic easily.
12. Become overwhelmed by events.
18. Feel that I'm unable to deal with things.
24. Can't make up my mind.
30. Get overwhelmed by emotions.
36. Remain calm under pressure.
42. Can handle complex problems.
48. Know how to cope.
54. Readily overcome setbacks.
60. Am calm even in tense situations.

21

Table 2
Scale-Score Mean Differences
Scale
IPIP- N
Anxiety
Anger
Depression
SelfConsciousness
Immoderation
Vulnerability

Males
Females
Males
Females
Males
Females
Males
Females
Males
Females

Mean
166.2224
180.9564
27.9241
31.7973
27.7967
30.3566
27.7967
30.3566
26.6736
28.0744

Std. Dev.
36.81953
36.50301
7.75393
7.60644
9.23932
8.91351
9.23932
8.91351
9.54256
9.33252

S.E.M.
.41843
.31713
.08812
.06608
.10500
.07744
.10500
.07744
.10845
.08108

Males
Females
Males
Females

28.3976
29.9756
31.7359
33.3920

7.57191
7.60670
6.79126
6.94223

.08605
.06609
.07718
.06031

22

Table 3
Item Parameter Estimates for Females
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

a
1.34
1.17
1.87
1.06
0.481
1.6
1.62
1.38
1.7
1.28
0.953
1.37
1.6
1.9
2.32
0.903
1.13
1.93
2.21
2.02
1.83
0.681
0.651
1.01
1.82
1.3
1.98
0.794
0.443
1.65
1.48
1.4
1.87
0.726
0.48
1.31
1.38
1.21
1.13
0.863

b1
-3.38
-2.11
-1.71
-2.14
-4.62
-1.37
-1.73
-2.55
-0.786
-2.85
-2.91
-2.13
-1.34
-1.74
-1.1
-2.23
-2.43
-0.925
-1.71
-0.79
-0.807
-2.81
-2.52
-2.49
-2.45
-1.62
-1.57
-3.1
-6.88
-2.11
-2.17
-2.34
-0.512
-2.27
-6.08
-1.6
-2.14
-2.18
-1.6
-2.98

b2
-1.84
-0.581
-0.581
-0.496
-1.92
-0.203
-0.521
-1.07
0.167
-1.36
-1.45
-0.568
-0.021
-0.499
-0.19
-0.423
-0.77
0.166
-0.632
0.362
0.086
-0.473
-0.955
-0.775
-1.09
-0.302
-0.566
-0.736
-4.07
-0.895
-0.712
-0.867
0.322
-0.024
-2.17
0.377
0.034
-0.431
-0.276
-0.912

23

b3
-1.12
0.361
0.157
0.541
-0.158
0.56
0.194
-0.268
0.988
-0.514
-0.527
0.457
0.774
0.14
0.436
0.436
0.139
0.814
-0.139
1.09
0.633
0.814
0.578
0.074
-0.415
0.32
-0.019
0.261
-1.91
-0.236
-0.135
-0.205
0.982
1.24
0.209
1.15
0.771
0.241
0.407
-0.208

b4
0.852
2
1.39
2.31
2.88
1.85
1.46
1.37
2.21
1.04
1.26
2.12
2.16
1.34
1.42
2.44
1.83
2.01
0.924
2.19
1.71
3.22
2.86
1.94
1.2
1.77
1.06
2.34
1.44
0.992
1.37
1.43
2.17
3.95
4.13
2.73
2.49
2.07
1.94
1.75

41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

0.551
0.731
1
1.3
1.66
0.719
0.708
1.57
1.29
1.44
1.69
0.87
0.367
1.13
1.11
1.05
1.46
0.861
0.446
1.38

-5.2
-1.86
-3.44
-2.31
-1.84
-3.57
-3.98
-1.21
-2.45
-1.61
-1.07
-3.11
-5.04
-2.36
-1.71
-3.14
-1.92
-1
-8.07
-1.93

-1.43
1.44
-0.924
-0.542
-0.559
-0.548
-0.504
0.831
-0.848
0.544
0.525
-0.809
-1.54
0.287
0.752
-1.04
-0.159
1.53
-4.11
0.014

24

0.206
2.84
0.161
0.022
0.034
0.512
0.632
1.56
-0.41
1.3
1.1
-0.015
-0.266
1.46
1.57
-0.059
0.765
2.31
-1.88
0.774

3.79
5.13
2.63
1.77
1.44
2.88
3.29
2.86
1.39
2.86
2.29
2.12
4.11
3.34
3.34
2.09
2.07
4.21
2.44
2.31

Table 4
Item Parameter Estimates for Males
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

a
1.333
0.977
1.784
1.117
0.319
1.597
1.46
1.225
1.852
1.245
0.854
1.382
1.588
1.568
2.244
1
1.068
1.911
2.195
1.862
1.989
0.813
0.537
1.088
1.764
1.156
1.784
0.854
0.387
1.529
1.401
1.235
2.048
0.803
0.423
1.411
1.382
1.098
1.225
0.931

b1
-2.958
-2.203
-1.978
-1.805
-5.642
-1.006
-1.938
-2.784
-0.938
-2.713
-3.152
-1.835
-1.138
-1.754
-1.376
-2.499
-2.958
-1.072
-1.509
-1.237
-0.945
-2.672
-3.152
-2.305
-2.499
-1.672
-1.662
-3.54
-8.07
-1.825
-2.009
-2.295
-0.936
-2.346
-6.57
-1.291
-1.999
-2.009
-1.927
-2.652

b2
-1.55
-0.745
-0.948
-0.307
-2.111
0.101
-0.752
-1.301
-0.073
-1.356
-1.713
-0.441
0.182
-0.526
-0.506
-0.976
-1.244
0.006
-0.546
-0.072
-0.136
-0.727
-1.397
-0.704
-1.146
-0.431
-0.703
-1.397
-4.591
-0.636
-0.545
-0.906
-0.123
-0.37
-2.142
0.538
0.112
-0.411
-0.714
-0.814

25

b3
-0.8
0.23
-0.156
0.767
0.848
0.94
-0.015
-0.447
0.652
-0.552
-0.706
0.604
1.052
0.178
0.141
-0.139
-0.318
0.685
-0.013
0.701
0.39
0.463
0.777
0.148
-0.443
0.23
-0.065
-0.367
-1.958
0.109
0.06
-0.231
0.516
0.828
0.709
1.277
0.838
0.249
-0.033
-0.089

b4
1.113
2.073
1.134
2.603
5.491
2.246
1.358
1.267
1.889
1.011
1.246
2.297
2.481
1.522
1.164
1.664
1.511
1.96
1.011
1.909
1.389
2.695
3.45
1.971
1.154
1.777
1.052
1.593
2.052
1.399
1.603
1.409
1.685
3.409
5.297
2.756
2.409
2.062
1.358
1.818

41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

0.626
0.752
1.166
1.225
1.715
0.804
0.735
1.637
1.176
1.431
1.94
1
0.287
1.205
1.205
0.809
1.607
0.99
0.368
1.392

-4.591
-1.217
-2.529
-2.193
-1.958
-3.315
-3.438
-1.325
-2.468
-1.307
-1.179
-2.856
-5.978
-2.223
-1.682
-3.223
-1.835
-0.925
-9.223
-1.601

-1.509
1.797
-0.348
-0.569
-0.804
-0.636
-0.453
0.527
-0.677
0.55
0.185
-0.866
-1.784
0.091
0.499
-0.943
-0.327
1.369
-4.733
0.287

-0.096
3.256
0.518
0.051
-0.208
0.497
0.777
1.318
-0.2
1.277
0.712
-0.117
-0.166
1.164
1.369
0.133
0.478
2.164
-1.201
1.032

3.175
5.318
2.685
1.726
1.113
2.767
3.511
2.593
1.593
2.664
1.828
1.797
5.369
2.848
3.236
2.562
1.756
3.807
3.756
2.481

Note. item dim threshold parameter for X oveX emmales. Transformed coefficients:
slope(A)=1.0204 intercept (K)= -0.2947. age 20 (n=7333) , ages 20 to 30 (n=8080), over
30 to 55 (n=5226).

26

Table 5
Item Parameter Estimates for Age 20 and Under
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

a
1.461
1.146
1.861
1.114
0.456
1.661
1.503
1.367
1.724
1.241
0.77
1.241
1.598
1.787
2.26
0.918
0.977
1.956
2.271
1.892
1.84
0.771
0.541
0.958
1.861
1.304
1.861
0.801
0.427
1.588
1.609
1.451
1.735
0.711
0.43
1.419
1.524
1.199
1.02
1.049

b1
-2.682
-1.96
-1.56
-1.779
-4.509
-1.037
-1.703
-2.445
-0.734
-2.825
-4.176
-2.131
-1.265
-1.589
-1.018
-2.311
-2.968
-0.999
-1.389
-0.83
-0.79
-2.121
-2.835
-2.93
-2.283
-1.503
-1.484
-3.139
-6.744
-1.902
-1.617
-1.912
-0.758
-2.521
-6.63
-1.322
-1.541
-1.874
-1.693
-2.178

b2
-1.389
-0.618
-0.537
-0.343
-1.769
-0.02
-0.521
-1.094
0.143
-1.436
-2.369
-0.606
0.009
-0.446
-0.161
-0.656
-1.237
0.061
-0.525
0.276
0.052
-0.207
-1.104
-1.132
-0.989
-0.307
-0.573
-0.961
-3.776
-0.809
-0.402
-0.697
0.153
-0.286
-2.321
0.4
0.308
-0.398
-0.304
-0.65

27

b3
-0.637
0.401
0.227
0.707
0.483
0.775
0.243
-0.244
0.939
-0.534
-1.151
0.623
0.902
0.217
0.45
0.22
-0.228
0.778
-0.024
1.037
0.608
0.932
1.208
-0.131
-0.255
0.339
0.022
0.103
-1.513
-0.073
0.152
-0.056
0.882
1.132
1.103
1.198
1.065
0.311
0.545
-0.035

b4
1.132
1.94
1.369
2.34
3.624
1.95
1.455
1.265
2.045
0.961
1.075
2.302
2.216
1.331
1.36
2.121
1.55
1.912
0.909
2.064
1.598
2.996
3.757
1.864
1.236
1.626
1.008
1.997
1.807
1.056
1.417
1.369
2.026
3.785
5.402
2.625
2.577
1.959
2.13
1.541

41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

0.546
0.728
1.062
1.335
1.619
0.795
0.719
1.503
1.188
1.535
1.777
1.001
0.328
1.135
1.209
0.97
1.598
0.917
0.466
1.493

-4.842
-1.731
-2.606
-1.95
-1.655
-3.206
-3.301
-1.132
-2.388
-1.218
-0.932
-2.568
-4.195
-2.35
-1.551
-2.92
-1.455
-0.792
-7.077
-1.484

-1.436
1.265
-0.553
-0.501
-0.502
-0.525
-0.233
0.742
-0.812
0.603
0.431
-0.536
-0.492
0.147
0.629
-1.065
-0.035
1.493
-3.519
0.13

0.34
2.73
0.535
0.087
0.12
0.604
1.046
1.588
-0.289
1.35
1.018
0.263
1.217
1.522
1.503
-0.024
0.755
2.302
-0.49
0.906

3.871
5.003
2.644
1.645
1.408
2.844
3.7
2.92
1.522
2.654
2.102
2.102
6.002
3.414
3.196
2.083
1.893
4.08
3.405
2.283

b1 =item threshold parameter for X oveX E)rs transformed to scale of Mid-aents: slope
(A)= 0.95110.9312 intercept (K)= 0.02840.0061

28

Table 6
Item Parameter Estimates for Age 20 to 30 Group
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

a
1.49
1.11
1.78
1.13
0.494
1.7
1.56
1.3
1.75
1.3
0.932
1.48
1.67
1.84
2.29
0.892
1.06
1.94
2.24
1.85
1.88
0.743
0.596
1.06
1.82
1.26
1.96
0.779
0.428
1.66
1.55
1.38
1.85
0.704
0.505
1.44
1.48
1.21
1.14
0.93

b1
-2.92
-1.94
-1.82
-1.85
-3.97
-1.04
-1.73
-2.58
-0.705
-2.71
-2.77
-1.75
-1.12
-1.57
-1.15
-2.39
-2.54
-0.833
-1.49
-0.918
-0.717
-2.64
-2.95
-2.34
-2.4
-1.42
-1.56
-3.32
-7.4
-1.84
-1.95
-2.14
-0.564
-2.44
-5.84
-1.26
-1.94
-1.87
-1.72
-2.63

b2
-1.55
-0.501
-0.675
-0.285
-1.46
0.016
-0.538
-1.07
0.234
-1.26
-1.35
-0.314
0.167
-0.408
-0.212
-0.522
-0.743
0.262
-0.488
0.313
0.164
-0.438
-1.21
-0.639
-1
-0.172
-0.555
-0.915
-4.32
-0.694
-0.552
-0.774
0.303
-0.031
-2.02
0.569
0.135
-0.25
-0.371
-0.71

29

b3
-0.927
0.386
0.114
0.723
0.248
0.703
0.164
-0.285
1.01
-0.477
-0.454
0.581
0.868
0.214
0.416
0.341
0.179
0.883
-0.026
1.05
0.681
0.768
0.496
0.157
-0.379
0.41
-0.02
0.093
-1.97
-0.072
-0.03
-0.154
0.933
1.23
0.162
1.25
0.79
0.34
0.249
-0.056

b4
0.833
2.09
1.42
2.48
3.24
1.96
1.51
1.4
2.26
1.09
1.41
2.21
2.23
1.46
1.43
2.31
2.04
2.14
1.02
2.28
1.75
3.11
2.9
1.97
1.25
1.91
1.09
2.23
1.62
1.14
1.45
1.45
2.19
4.16
4.06
2.79
2.41
2.17
1.71
1.83

41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

0.586
0.84
1.16
1.34
1.74
0.801
0.813
1.64
1.34
1.5
1.81
0.98
0.335
1.2
1.19
1.01
1.62
0.946
0.443
1.51

-4.99
-1.23
-2.71
-2.08
-1.82
-3.12
-3.35
-1.13
-2.27
-1.3
-1.01
-2.72
-6.08
-2.09
-1.46
-2.91
-1.7
-0.831
-8.13
-1.55

-1.48
1.64
-0.5
-0.431
-0.565
-0.437
-0.42
0.824
-0.641
0.662
0.491
-0.654
-2.28
0.367
0.845
-0.884
-0.082
1.52
-4.26
0.254

30

-0.04
2.9
0.345
0.1
0.027
0.551
0.598
1.53
-0.258
1.37
1.03
0.043
-0.974
1.42
1.61
0.041
0.762
2.3
-1.88
0.934

3.4
4.89
2.59
1.77
1.43
2.73
3.08
2.81
1.43
2.85
2.2
2.01
3.85
3.19
3.38
2.24
2.04
4.04
2.43
2.34

Table 7
Item Parameter Estimates for Over 30 to 55

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

a
1.329
1.12
1.809
1.19
0.469
1.699
1.679
1.339
1.779
1.359
1
1.649
1.719
1.859
2.209
0.88
1.23
1.959
2.439
1.949
1.979
0.764
0.736
1.17
1.889
1.319
2.009
0.734
0.398
1.789
1.519
1.349
2.039
0.83
0.535
1.409
1.449
1.27
1.28

b1
-3.107
-2.106
-1.776
-1.796
-4.467
-1.126
-1.636
-2.506
-0.806
-2.376
-2.316
-1.646
-0.97
-1.646
-1.176
-2.206
-2.266
-0.781
-1.506
-0.872
-0.772
-3.057
-2.186
-1.706
-2.256
-1.626
-1.426
-3.497
-7.918
-1.696
-2.186
-2.416
-0.432
-1.726
-5.077
-1.356
-2.256
-2.086
-1.596

b2
-1.676
-0.486
-0.667
-0.328
-2.006
0.012
-0.457
-0.97
0.122
-1.023
-0.938
-0.337
0.272
-0.371
-0.287
-0.459
-0.653
0.301
-0.447
0.306
0.067
-0.777
-0.809
-0.156
-0.992
-0.299
-0.418
-0.87
-4.667
-0.528
-0.653
-0.834
0.306
0.162
-1.616
0.566
-0.088
-0.336
-0.411

31

b3
-1.018
0.388
0.027
0.582
-0.4
0.733
0.214
-0.175
0.896
-0.235
-0.015
0.516
1.025
0.267
0.387
0.462
0.248
0.89
0.054
1.095
0.603
0.502
0.383
0.636
-0.344
0.334
0.177
0.208
-2.186
0.08
-0.08
-0.161
0.935
1.225
0.025
1.235
0.581
0.295
0.202

b4
0.965
2.225
1.395
2.255
2.806
1.945
1.555
1.645
2.235
1.325
1.705
2.005
2.376
1.595
1.505
2.646
2.025
2.095
1.135
2.335
1.655
2.866
2.666
2.365
1.245
1.965
1.365
2.676
1.845
1.325
1.495
1.635
2.155
3.736
3.586
2.636
2.195
2.135
1.725

40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

0.81
0.609
0.797
1.28
1.29
1.709
0.707
0.743
1.719
1.389
1.519
1.699
0.902
0.398
1.18
1.13
0.984
1.439
0.961
0.434
1.409

-3.247
-4.777
-1.276
-2.746
-2.256
-1.856
-3.567
-3.817
-1.146
-2.226
-1.476
-1.106
-3.137
-5.347
-2.076
-1.676
-3.007
-2.186
-0.915
-8.699
-1.806

-0.909
-1.186
1.895
-0.568
-0.391
-0.606
-0.427
-0.492
0.872
-0.586
0.554
0.579
-0.986
-1.876
0.438
0.743
-0.716
-0.275
1.545
-4.267
0.193

-0.1
0.288
3.126
0.203
0.191
-0.062
0.621
0.54
1.515
-0.187
1.225
1.115
-0.257
-0.816
1.315
1.545
0.249
0.684
2.195
-2.396
0.835

2.075
3.746
5.137
2.305
2.045
1.355
3.016
3.006
2.706
1.525
2.826
2.365
1.855
3.556
3.006
3.286
2.516
2.095
3.876
2.285
2.325

item dim threshold parameter for X oveX(


Note. Item parameters transformed to scale of Mid-age. Transformed coefficients: slope
(A)= 1.0004 intercept (K)=-0.0955.

32

Table 8
NCDIF results for Gender Comparison
Item
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

C-DIF
-0.305
0.095
0.489
-0.273
-0.429
-0.541
0.277
0.171
0.412
0.005
0.095
-0.204
-0.339
-0.082
0.578
0.632
0.479
0.24
-0.213
0.692
0.422
0.237
0.036
-0.075
0.045
0.103
0.098
0.617
-0.096
-0.522
-0.283
-0.01
0.809
0.277
-0.196
-0.231
-0.082
-0.047
0.622
-0.107

NC-DIF
0.032
0.005
0.079
0.024
0.061
0.09
0.027
0.012
0.054
0
0.005
0.013
0.036
0.005
0.105
0.124
0.076
0.018
0.015
0.148
0.056
0.019
0.004
0.002
0.001
0.005
0.006
0.119
0.003
0.084
0.025
0
0.201
0.024
0.012
0.017
0.002
0.001
0.119
0.005

33

CHI
65757.45
7659.59
74533.57
482511.3
31370.48
227460.6
36078.74
20183.67
37389.77
3986.27
9512.23
355565.2
122129.9
4437.08
82429.19
301559.4
112498.8
90057.97
58249.98
149318.9
36956.57
26637.94
3522.27
14920.82
31186.02
11802.8
7894.9
*********
15829.33
211373.5
241565
3084.1
64966.23
73261.99
55642.44
136274.6
51933.67
40429.44
239165.6
20446.29

PROB
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0.1391
0
0
0
0
0
0
0
0

41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

0.155
-0.304
-0.443
-0.016
0.401
0.025
-0.125
0.262
-0.251
-0.057
0.448
0.133
-0.088
0.184
0.136
-0.231
0.273
-0.012
-0.282
-0.339

0.008
0.029
0.068
0
0.05
0.001
0.005
0.023
0.019
0.003
0.069
0.007
0.004
0.012
0.007
0.018
0.025
0.001
0.025
0.036

27256.54
*********
46497.76
25316.25
168485.6
3740
210508.3
27900.43
109384.6
5801.24
18906.95
10864.84
7046.63
15091.13
16500.2
19753.09
20114.33
3163.24
104019.8
346744.3

34

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0.0188
0
0

Table 9
NCDIF results for Younger/Mid Comparison
Item
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

C-DIF
-0.025
0.009
-0.006
0.006
-0.004
-0.001
-0.003
0.008
0.016
0.008
0.032
0.005
0.01
0.007
-0.002
0.014
0.039
0.031
0.012
0.007
0.021
-0.013
-0.033
0.024
-0.003
0.022
0.001
0.007
-0.014
0.006
-0.012
0
0.024
0.014
-0.027
0.016
-0.028
0.013
-0.031
0.012

NC-DIF
0.031
0.002
0.012
0.001
0.001
0
0.001
0.001
0.006
0.005
0.117
0.017
0.007
0.001
0.003
0.005
0.073
0.026
0.002
0.001
0.011
0.013
0.03
0.048
0.002
0.012
0
0.001
0.011
0.003
0.02
0.004
0.022
0.006
0.017
0.007
0.029
0.005
0.019
0.005

35

CHI
156049.1
49070.14
12914
14985.15
3012.47
3006.79
22632.92
4263.68
31342.4
56039.09
29501.23
9903.11
34707.98
60969.46
8361.93
39022.35
255689
82462.38
4749.07
3058.96
62108.03
49015.58
81415.7
45354.43
13487.18
49628.28
3185.76
7707.43
169173.3
46088.59
20668.89
6943.36
88686.33
209323.2
30763.34
58771.66
553754.9
105129.1
19566.32
3021.92

PROB
0
0
0
0
0.4277
0.4566
0
0
0
0
0
0
0
0
0
0
0
0
0
0.2184
0
0
0
0
0
0
0.0089
0
0
0
0
0
0
0
0
0
0
0
0
0.3807

41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

-0.017
0.027
-0.009
0.008
-0.009
0
-0.027
0.005
-0.003
0.006
0.007
-0.013
-0.078
0.007
0.017
0.009
0.002
0.002
-0.039
0.011

0.009
0.03
0.002
0.001
0.004
0
0.018
0.001
0.003
0.001
0.001
0.008
0.246
0.005
0.007
0.004
0.002
0
0.069
0.002

155560
*********
18946.26
9545.24
60110.19
4949.14
40084.59
16284.11
4550.96
3317.68
4235.67
170391.9
*********
21675.56
45598.42
359302.5
4883.88
34757.2
761946.9
14965.53

36

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Table 10
NCDIF results for Older/Mid Comparison
Item
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

C-DIF
0.005
-0.001
-0.012
-0.022
-0.079
-0.006
0.025
0.038
-0.04
0.074
0.103
-0.01
0.051
0.016
-0.016
0.036
0.022
0.013
0.028
0.013
-0.033
-0.066
0.035
0.147
0.006
-0.035
0.081
0.027
0
0.064
-0.025
-0.004
0.028
0.052
0.007
-0.014
-0.072
-0.026
-0.002
-0.02

NC-DIF
0.002
0
0.001
0.003
0.033
0
0.004
0.008
0.008
0.03
0.056
0.003
0.013
0.002
0.002
0.007
0.004
0.001
0.004
0.001
0.006
0.022
0.011
0.113
0.001
0.006
0.034
0.004
0
0.021
0.004
0.002
0.005
0.017
0.001
0.001
0.027
0.004
0.001
0.006

37

CHI
3165.09
3060.5
10862.24
11177.46
462746.1
6488.94
20249.52
770048.4
56525.04
82262.64
91454.2
3886.54
187557.6
10532.46
7652.59
252365.8
7446.37
18526.38
53356.36
95998.38
20861.17
*********
6683.02
193635.1
5607.77
81157.16
124862.7
16867.4
3012.99
83319.77
11771.55
3177.73
9689.28
15058.1
3950.7
133685.9
346580.4
405505.3
3028.78
4633.15

PROB
0.0172
0.2127
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0.4251
0
0
0.0115
0
0
0
0
0
0
0.3476
0

41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

0.047
0.012
-0.033
0.022
-0.025
0
-0.024
0.005
0.021
-0.036
0.007
-0.072
0.008
-0.003
-0.034
0.044
-0.06
-0.009
-0.026
-0.038

0.012
0.001
0.006
0.004
0.003
0.002
0.003
0
0.002
0.007
0.002
0.028
0.001
0
0.006
0.01
0.021
0
0.003
0.008

*********
7621.37
23812.28
7797.24
126329.3
3002.96
23178.51
6560.07
122647.5
246381.2
3447.18
65397.04
4229.74
4475.51
397055.9
91483.62
20080.37
66128.66
196983.7
53712.91

38

0
0
0
0
0
0.4762
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Figure Captions
Figure 1. Age-based DIF, boundary response functions (cumulative of a response
above k) for Item 11, Don't know why I do some of the things I do. Solid lines indicated the mid-range age group, dottedred es indicated the younger age group.
Figure 2. Age-based DIF, boundary response functions for the item Never spend
more than I can afford. (Immoderation, Item 53). Solid lines indicate the mid-range age
group, dotted lines indicate the younger age group.
Figure 3. Age-based DIF, boundary response functions for the item Readily
overcome setbacks. (Vulnerability, Item 24). Solid lines indicated the mid-range age
group, dotted lines indicated the older age group.
Figure 4. Gender-based DIF, boundary response functions for the item Am often
down in the dumps. (Depression, Item 15). Solid line = females, Dotted line = males.
Figure 5. Gender-based DIF, boundary response functions for the item Find it
difficult to approach others. (Self-Consciousness, Item 16). Solid line = females, dotted
line = males.
Figure 6. Gender-based DIF, boundary response functions for the item Am often
in a bad mood. (Anger, Item 20). Solid line = females, dotted line = males.
Figure 7. Gender-based DIF, boundary response functions for the item Only feel
comfortable with friends (Self-Consciousness, Item 28). Solid line = females, dotted line
= males.
Figure 8. Gender-based DIF, boundary response functions for the item Feel that
my life lacks direction. (Depression, Item 39). Solid line = females, dotted line = males.

39

Figure 9. Gender-based DIF, category response curves for the item Am often
down in the dumps (Depression, Item 15). Solid lines = females, dotted = males.
Figure 10. Gender-based DIF, category response curves for the item Find it
difficult to approach others (Self-Consciousness, Item 16). Solid lines = females, dotted
= males.
Figure 11. Gender-based DIF, category response curves for the item Am often in
a bad mood (Anger, Item 20). Solid lines = females, dotted = males.
Figure 12. Age-based DIF, category response curves for the item Don't know
why I do some of the things I do (Immoderation, Item 11). Solid lines = mid-age group,
dotted lines = younger group.
Figure 13. Age-based DIF, category response curves for the item Never spend
more than I can afford (Immoderation, Item 53). Solid lines = mid-age group, dotted
lines = younger group.

40

Figure 1. Age-based DIF, boundary response functions (cumulative of a response


above k) for Item 11, Don't know why I do some of the things I do. Solid lines indicated the mid-range age group, dottedred es indicated the younger age group.

P(theta)

0.8
0.6
0.4
0.2
0
-4.0 -3.0 -2.0 -1.0 0.0

Theta

41

1.0

2.0

3.0

4.0

Figure 2. Age-based DIF, boundary response functions for the item Never spend
more than I can afford. (Immoderation, Item 53). Solid lines indicate the mid-range age
group, dottedred ls indicate the younger age group.

P(theta)

0.8
0.6
0.4
0.2
0
-4.0 -3.0 -2.0 -1.0 0.0

Theta

42

1.0

2.0

3.0

4.0

Figure 3. Age-based DIF, boundary response functions for the item Readily
overcome setbacks. (Vulnerability, Item 24). Solid lines indicated the mid-range age
group, dottedred es indicated the older age group.

P(theta)

0.8
0.6
0.4
0.2
0
-4.0 -3.0 -2.0 -1.0 0.0

Theta

43

1.0

2.0

3.0

4.0

Figure 4. Gender-based DIF, boundary response functions for the item Am often
down in the dumps. (Depression, Item 15). Solid line = females, Dotted line = males.

P(theta)

0.8
0.6
0.4
0.2
0
-4.0 -3.0 -2.0 -1.0 0.0

Theta

44

1.0

2.0

3.0

4.0

Figure 5. Gender-based DIF, boundary response functions for the item Find it
difficult to approach others. (Self-Consciousness, Item 16). Solid line = females, dotted
line = males.

P(theta)

0.8
0.6
0.4
0.2
0
-4.0 -3.0 -2.0 -1.0 0.0

Theta

45

1.0

2.0

3.0

4.0

Figure 6. Gender-based DIF, boundary response functions for the item Am often
in a bad mood. (Anger, Item 20). Solid line = females, dotted line = males.

P(theta)

0.8
0.6
0.4
0.2
0
-4.0 -3.0 -2.0 -1.0 0.0

Theta

46

1.0

2.0

3.0

4.0

Figure 7. Gender-based DIF, boundary response functions for the item Only feel
comfortable with friends (Self-Consciousness, Item 28). Solid line = females, dotted line
= males.

P(theta)

0.8
0.6
0.4
0.2
0
-4.0 -3.0 -2.0 -1.0 0.0

Theta

47

1.0

2.0

3.0

4.0

Figure 8. Gender-based DIF, boundary response functions for the item Feel that
my life lacks direction. (Depression, Item 39). Solid line = females, dotted line = males.

P(theta)

0.8
0.6
0.4
0.2
0
-4.0 -3.0 -2.0 -1.0 0.0

Theta

48

1.0

2.0

3.0

4.0

Figure 9. Gender-based DIF, category response curves for the item Am often
down in the dumps (Depression, Item 15). Solid lines = females, dotted = males.

1
0.9
0.8

P(Theta)

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-4.0

-3.0

-2.0

-1.0

0.0

Theta

49

1.0

2.0

3.0

4.0

Figure 10. Gender-based DIF, category response curves for the item Find it difficult to
approach others (Self-Consciousness, Item 16). Solid lines = females, dotted = males.

1
0.9
0.8

P(Theta)

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-4.0

-3.0

-2.0

-1.0

0.0

The ta

50

1.0

2.0

3.0

4.0

Figure 11. Gender-based DIF, category response curves for the item Am often in
a bad mood (Anger, Item 20). Solid lines = females, dotted = males.

1
0.9
0.8

P(Theta)

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-4.0

-3.0

-2.0

-1.0

0.0

The ta

51

1.0

2.0

3.0

4.0

Figure 12. Age-based DIF, category response curves for the item Don't know why I do
some of the things I do (Immoderation, Item 11). Solid lines = mid-age group, dotted
lines = younger group.

1
0.9
0.8

P(Theta)

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-4.0

-3.0

-2.0

-1.0

0.0

The ta

52

1.0

2.0

3.0

4.0

Figure 13. Age-based DIF, category response curves for the item Never spend
more than I can afford (Immoderation, Item 53). Solid lines = mid-age group, dotted
lines = younger group.

1
0.9
0.8

P(Theta)

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-4.0

-3.0

-2.0

-1.0

0.0

The ta

53

1.0

2.0

3.0

4.0

You might also like