Professional Documents
Culture Documents
Dimensionality Analysis
Package
Abstract
DIMPACK Version 1.0 for assessing test dimensionality based on a nonparametric conditional
covariance approach is reviewed. This software was originally distributed by Assessment
Systems Corporation and now can be freely accessed online. The software consists of
Windows-based interfaces of three components: DIMTEST, DETECT, and CCPROX/HAC,
which conduct hypothesis test for unidimensionality, cluster items, and perform hierarchical
cluster analysis, respectively. Two simulation studies were conducted to evaluate the software
in confirming test unidimensionality (a Type I error study) and detecting multidimensionality (a
statistical power study). The results suggested that different data always be used in selecting
assessment subtest items independent of calculating the DIMTEST statistic. The Type I error
rate was excessively inflated otherwise. The statistical power was found low when sample size
was small or the dimensions were highly correlated. It is suggested that some major changes be
made to the software before it can be successfully useful among practitioners.
Keywords
test dimensionality, DIMPACK, item response theory software
Introduction
Item response theory (IRT) and its applications have been widely used in various educational
and psychological testing practices, including test construction, ability estimation, score report-
ing, equating, detection of differential item functioning (DIF), and computer adaptive testing.
One of the fundamental assumptions for the most commonly used IRT models is unidimension-
ality (Hambleton, Swaminathan, & Rogers, 1991), that is, only one single latent variable
accounts for the item responses. To justify the uses of IRT, unidimensionality needs to be
1
University of Massachusetts Medical School, Worcester, USA
2
Graduate Management Admission Council, Reston, VA, USA
3
University of Massachusetts–Amherst, USA
Corresponding Author:
Nina Deng, University of Massachusetts Medical School, 55 Lake Ave. N., AC7-053, Worcester, MA 01655, USA
Email: nina.deng@umassmed.edu
Deng et al. 163
evaluated before any unidimensional IRT modeling should be applied. However, it is often the
case that many educational and psychological tests are not strictly unidimensional. Therefore, it
is important to check whether the test satisfies the unidimensionality assumption and to detect
how many dimensions may be needed to fit the test data. Checking test dimensionality structure
is, therefore, a fundamental practice and should always be conducted before applying an IRT
model to avoid negative consequences.
distinct dimension. Each item can only be viewed as measuring one dimension. The items are
split in the way that items in the same cluster have positive CCOVi, l , whereas items in different
clusters have negative CCOVi, l . What DETECT does is to search through all possible item par-
titions (see description of CCPROX/HAC below) and tries to find the optimal partition by max-
imizing the Index D:
2
P Ð
DðP, YTT Þ = nðn1Þ di, l CCOVi, l , ð3Þ
1iln
where di, l = 1 if item i and l are in the same cluster and 21 if in different clusters. The number
of clusters should approximate the test dimensionality. Unlike DIMTEST, DETECT does not
provide a hypothesis test.
CCPROX/HAC (Roussos, Stout, & Marden, 1998) is a dimensionality-sensitive cluster anal-
ysis procedure. HAC performs an agglomerative hierarchical cluster analysis, based on the
dimensionality-sensitive proximity matrix provided by CCPROX. The agglomerative HAC
quickly clusters the items progressively into larger dimensionally homogeneous groups. At each
level of the hierarchy, the two clusters with the minimum proximity are joined at the next level.
The proximity measure for a pair of items i and l is given by
By adding a constant and reversing the sign of CCOVi, l , items with a positive CCOV will
have a small rccov value, whereas items with a negative CCOV will have a large rccov value.
CCPROX and HAC are typically used together to suggest a plausible dimensional structure
information at each level of the cluster hierarchy. In particular, these procedures can jointly pro-
vide the initial item clustering for DETECT and possible AT items to be tested in DIMTEST
(Froelich & Habing, 2008). However, the algorithms do not automatically determine which
stage of the cluster analysis presents the optimal clustering of the items.
Program Description
DIMPACK (Version 1.0) is a comprehensive computer software package incorporating the var-
ious nonparametric CCOV-based dimensionality analyses for dichotomous items. It was devel-
oped by Roussos and Stout at the William Stout Institute for Measurement (2006) and is
distributed by Assessment Systems Corporation. Now, it can be freely accessed at the Psycho-
Source website maintained by Measured Progress (http://psychometrictools.measuredprogres-
s.org/dif2). The program has Microsoft WindowsÒ-based interfaces for three software compo-
nents: DIMTEST, DETECT, and CCPROX/HAC, each of which was previously distributed as
stand-alone DOS programs. DIMPACK integrates the three components into one package and
streamlines the analyses of clustering items, detecting multidimensionality and testing the uni-
dimensionality hypothesis.
The software limits the maximum number of items to 150 and the maximum number of
examinees to 7,000. It requires Microsoft Windows 98Ò or later and .NET Framework 1.1 or
later to run. It is easy to install and very user-friendly to run. To install the package after down-
loading, the user simply runs the ‘‘setup.exe’’ program and follows the on-screen instructions.
The default directory to which the program is installed is ‘‘c:\Program Files\Dimpack1.0\.’’
Below is the description of each interface component in the package.
Deng et al. 165
DIMTEST
The overall interface of DIMTEST is quite user-friendly and the steps are straightforward.
There are three general program execution steps: (a) specify and load the data set(s), (b) choose
the AT items, and (c) specify the output file and run the program.
AT Item Selection
As DIMTEST tests a unidimensionality hypothesis based on the covariances of pairs of AT
items conditional on PT items, the selection of AT items is quite important and can dramati-
cally affect the performance of DIMTEST and the results. Compared with the earlier
DIMTEST_DOS program, there are two major changes made in the current version of
DIMPACK. First, the earlier DIMTEST_DOS program divided the test into three required subt-
ests: AT, AT2, and PT. In DIMPACK, the AT2 requirement was eliminated and instead
replaced by an AT-based simulated data set. This change was apparently implemented to
enable the program to be used with shorter tests. Second, the DIMTEST_DOS program used a
linear factor analysis (FAC) for automatic selection of AT items. In DIMPACK, FAC was
replaced by ATFIND, a procedure combining CCPROX/HAC and DETECT analyses.
Two methods are available for selecting AT items in DIMPACK: (a) a confirmatory analysis
(called user specified in DIMPACK) and (b) an exploratory analysis. The confirmatory analysis
requires a preidentified list of AT items. If the users are unable to identify the AT items, they
can elect to use the exploratory analysis approach, which uses an automatic selection proce-
dure, ATFIND, to statistically identify the AT items. The exploratory analysis can either use
the current data set or accept as input an independent data set to carry out the AT item selec-
tion. Using the current data is the default. Unfortunately, it implies that the same data set is
used for AT item selection and the DIMTEST statistic calculation. In our study—described fur-
ther on—the authors found that using an independent data set was absolutely essential as cross-
validation evidence and to avoid inflated Type I error rates due to capitalization on chance. To
use independent data (two separate data sets), the user needs to prepare beforehand a second
data set for input. One option might be to randomly divide the original data into two sets with
sampling ratios ideally between 1:2 and 1:1 (where a smaller data set can be used for AT item
selection and the larger data set for the DIMTEST analysis). Alternatively, users can manually
select the AT items, perhaps based on subject matter or expert judgment, or based on externally
run linear or nonlinear factor analyses. A minimum of 15 PT items were recommended in
DIMPACK help file. The DIMTEST output file (with appended file name extension, ‘‘.dim’’)
provides the AT and PT item lists, the DIMTEST T*-statistic and the associated p value.
Depending on the chosen Type I error level, users can decide whether to reject the null hypoth-
esis of essential unidimensionality.
DETECT
Similarly, there are confirmatory and exploratory approaches available in DETECT to partition
the items into clusters. Under the confirmatory approach, users must specify the number of clus-
ters and the items associated with each cluster. The interface allows users to either manually
select the items or to merely provide an input file (referred to as a ‘‘cluster file’’). For the
exploratory approach, users can decide whether to do cross-validation, and furthermore, whether
to use one file or two files for the cross-validation analysis. If only one file is available for
cross-validation, the user must specify the number of examinees set aside for cross-validation;
DETECT will then randomly split the original group of examinees into two samples. The output
166 Applied Psychological Measurement 37(2)
file generated by DETECT, with a file name extension ‘‘.det’’ appended, provides the clusters
and their associated items. As noted earlier, no dimensionality hypothesis test is available in
DETECT.
CCPROX/HAC
The third interface consists of two sections: CCPROX and HAC. CCPROX should be used prior
to HAC as it provides the proximity matrix that is subsequently required as the input for the
HAC analysis. Users specify the analysis parameters for CCPROX. For example, if guessing is
assumed to a multiple-choice test, the software authors recommend using the estimated lowest
number-correct score divided by the total number of items as the ‘‘guessing’’ parameter. In
addition, the user can specify whether CCPROX should produce CCOV, the proximity matrix
based on the conditional covariances, or conditional correlations (CCOR), the proximity matrix
based on product–moment correlations.
The proximity matrix file (*.prx) will be automatically loaded if HAC is run immediately
after executing CCPROX. The user also has a wide variety of HAC options from which to
choose regarding the nature and type of cluster analysis conducted. The actual hierarchical clus-
ter analysis is based on the ‘‘Sequential Agglomerative Hierarchical Nonoverlapping (SAHN)’’
algorithm. The HAC output file is saved with the appended file name extension ‘‘.hac.’’
Simulation Studies
The earlier DOS versions of DIMTEST and DETECT have been widely disseminated and used
in many simulation studies that investigated their utility and performance. Previous studies tend
to show that DIMTEST sufficiently detects multidimensionality when the sample size is large,
when the test is quite long, and/or when the between-trait correlations are relatively low (e.g.,
Deng & Ansley, 2000). At least one study has shown that that the Type I error is quite low
when there is no guessing or similar patterns of random responses evident in the data (Finch &
Habing, 2007). On the other hand, the performance of DIMTEST was found to be less satisfac-
tory with shorter tests or with data conforming to compensatory and partially compensatory
multidimensional IRT or factor analytic models (Hattie, Krakowski, Rogers, & Swaminathan,
1996; Meara, Robin, & Sireci, 2000). The literature does suggest that DETECT identifies sim-
ple structure effectively, but it does not perform well when the data exhibit complex item-factor
loading patterns (so-called ‘‘complex factor structures’’) and/or highly correlated dimensions.
Deng et al. 167
To evaluate the new integrated DIMPACK package for this review, two simulation studies
were conducted: First, a Type I error study was conducted to evaluate the software in terms of
confirming test unidimensionality (where the null hypothesis is true). Second, a statistical power
study was conducted to evaluate the software in terms of detecting test multidimensionality
(where the alternative hypothesis, rejecting the unidimensionality assumption, is true).
the option ‘‘current data’’ should probably never be selected when conducting exploratory
analysis for AT selection in DIMPACK. At the very least, additional research is needed and
the default settings in DIMPACK should be reconsidered until the issues can be more fully
understood.
Although the Type I error rates under the studied conditions using ‘‘different data’’ (two
independent data sets) showed a lot of improvement, the large sample size (5,000 exami-
nees) with short to moderate test lengths (30 or 50 items) still tended to have large Type I
error rates. For the smaller sample sizes (500 and 1,000 examinees), the Type I error rates
tended to be too low (.00-.02) compared with the nominal level (a = .05), which suggests an
increased potential for making Type II errors due to having too few examinees and being of
very limited potential with small data sets. Interestingly, the results showed that both types
of statistics produced in DIMTEST_DOS had much lower Type I error rates when the sam-
ple size was large.
where ai1 and ai2 are the slopes that are related with item’s discriminating power on
Dimensions 1 and 2 for item i; uj1 and uj2 are person j’s abilities on Dimensions 1 and 2; di is
the intercept that is negatively related with difficulty parameter of item i.
The person abilities uj1 and uj2 were randomly drawn from a bivariate normal distribution
with both means set to 0 and both variances set to 1. Conditions were crossed by three sam-
ple sizes (n = 500, 1,000, 5,000), three test lengths (30, 50, 100 items), and four degrees of
correlations between the two dimensions (r = 0, 0.3, 0.6, 0.9), which resulted in 36 condi-
tions in total (3 3 3 3 4). Based on the results of the Type I error rate study, only the option
of ‘‘different data’’ was used in AT selection in the statistical power study. One hundred
replications were simulated in each condition. The numbers of rejections out of 100 replica-
tions were recorded.
Results (Figure 3) showed that DIMPACK had excellent power when the sample size was
large (5,000 examinees) and the correlations between the dimensions were low to moderate
(correlations of 0, 0.3, and 0.6). In general, the higher the correlation was, the larger the sample
size needed to be to achieve a satisfactory level of statistical detection power. Small sample
size (500 examinees) with short test length had very low power and are not recommended for
practical work—at least based on the results of this study.
DIMTEST_DOS was also run on all of the generated multidimensional data. Compared
with DIMTEST_DOS, DIMPACK ran more smoothly and without abnormal program termi-
nations or interruptions. Still, DIMTEST_DOS could not analyze certain two-dimensional
3PL-generated data. Finally, if DIMTEST_DOS had such data accumulated to a certain
amount in a short period of time as the simulations did in the study, the executive file for
AT selection (ASN.EXE) crashed and the program failed to produce any valid results.
Deng et al. 169
0.5
0.9
0.4
Type I Error Rate
0.3
0.7
0.2
0.6
0.1
0.5
0.0
30 50 100 30 50 100
Test Length Test Length
0.5
0.4
0.4
Type I Error Rate
0.3
0.2
0.2
0.1
0.1
0.0
0.0
30 50 100 30 50 100
Test Length Test Length
Figure 1. Type I error rate for DIMPACK and DIMTEST_DOS with 2PL data
Note: 2PL = two parameter logistic model; AT = assessment subtest.
1.0
0.5
0.9
0.4
Type I Error Rate
0.3
0.7
0.2
0.6
0.1
0.5
0.0
30 50 100 30 50 100
Test Length Test Length
0.5
0.4
0.4
Type I Error Rate
0.3
0.2
0.2
0.1
0.1
0.0
0.0
30 50 100 30 50 100
Test Length Test Length
Figure 2. Type I error rate for DIMPACK and DIMTEST_DOS with 3PL data
Note: 3PL = three parameter logistic model; AT = assessment subtest.
Although the earlier DOS programs may have been slightly more efficient in terms of apparent
processing speed—possibly due to choices of compilers and other technical programming
issues—it failed to finish the analyses for some of the two-dimensional data studied by the
authors.
Based on our experience, the DIMPACK package appears to have some major shortcomings
that need to be addressed before it can be endorsed as practically useful. First and foremost, it
was found that the option of ‘‘current data’’ in AT selection resulted in distressingly high Type
I error rates under many typical situations that arise in practice and therefore should probably
be discouraged and investigated further. Certainly, the authors would not use the option in any
of their research work. It is suggested that ‘‘different data’’ be always selected—in fact, it
should become the new default data selection choice. In addition, it would be more convenient
for users if the software could provide an option to specify the exact number of examinees set
aside for AT selection and then automatically and randomly split the data into two sets for
cross-validation purposes. Second, the authors found that the statistical power was low under
many conditions except with very large sample sizes. This finding seems to limit the utility of
Deng et al. 171
1.0
0.8
0.8
Power Rate
Power Rate
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
30 50 100 30 50 100
Test Length Test Length
1.0
0.8
0.8
Power Rate
Power Rate
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
30 50 100 30 50 100
Test Length Test Length
Figure 3. Power rate for DIMPACK with compensatory two-dimensional three parameter logistic data
the software for many practitioners dealing with small or moderate data sets. Third, some dis-
crepancies were found in the Type I error rates with large sample size between DIMPACK and
DIMTEST_DOS programs. The discrepancies were substantial and deserve further investiga-
tion. Finally, successful applications and more widespread use may be hampered unless the
software developers provide better documentation with concrete examples of data files and
common applications.
Acknowledgment
The authors are grateful for the valuable comments from Richard M. Luecht, which strengthened the
review considerably.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
References
Deng, H., & Ansley, T. N. (2000, April). Detecting compensatory and noncompensatory multi-
dimensionality using DIMTEST. Paper presented at the meeting of the National Council on
Measurement in Education, New Orleans, LA.
Finch, H., & Habing, B. (2007). Performance of DIMTEST- and NOHARM-based statistics for testing
unidimensionality. Applied Psychological Measurement, 31, 292-307.
Froelich, A. G., & Habing, B. (2008). Conditional covariance-based subtest selection for DIMTEST.
Applied Psychological Measurement, 32, 138-155.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory.
Newbury Park, CA: SAGE.
Hattie, J., Krakowski, K., Rogers, H. J., & Swaminathan, H. (1996). An assessment of Stout’s index of
essential unidimensionality. Applied Psychological Measurement, 20, 1-14.
Kim, H. R. (1994). New techniques for the dimensionality assessment of standardized test data
(Unpublished doctoral dissertation). Department of Statistics, University of Illinois at Urbana-
Champaign, IL.
Meara, K., Robin, F., & Sireci, S. G. (2000). Using multidimensional scaling to assess the dimensionality
of dichotomous item data. Multivariate Behavioral Research, 35, 229-259.
Roussos, L., Stout, W., & Marden, J. (1998). Using new proximity measures with hierarchical cluster
analysis to detect multidimensionality. Journal of Educational Measurement, 35, 1-30.
Stout, W. (1987). A nonparametric approach for assessing latent trait unidimensionality. Psychometrika, 52,
589-617.
Stout, W., Habing, B., Douglas, J., Kim, H., Roussos, L., & Zhang, J. (1996). Conditional covariance-based
nonparametric multidimensionality assessment. Applied Psychological Measurement, 20, 331-354.
Stout, W., Froelich, A., & Gao, F. (2001). Using resampling methods to produce an improved DIMTEST
procedure. In A. Boomsma, M. A. J. van Duijn, & T. A. B. Snijders (Eds.), Essays on item response
theory (pp. 357-376). New York, NY: Springer-Verlag.
Stout, W., Nandakumar, R., Junker, B., Chang, H.-H., & Steidinger, D. (1992). DIMTEST: A Fortran
program for assessing dimensionality of binary item responses. Applied Psychological Measurement,
16, 236.
William Stout Institute for Measurement. (2006). Nonparametric dimensionality assessment package
DIMPACK (Version 1.0) [Computer software]. St. Paul, MN: Assessment Systems Corporation.
Zhang, J., & Stout, W. (1999).The theoretical DETECT index of dimensionality and its application to
approximate simple structure. Psychometrika, 64, 213-249.