Professional Documents
Culture Documents
When the NES gives you lemons: Making the most of less-than-perfect data
As social scientists, we are often forced to make the most of data that falls far short of the
ideal. Researchers who work with survey data are painfully familiar with having to make do with
much less than what they might have hoped for given greater resources or the opportunity to
field their own study. In this paper, I will show how, with a little extra work at the front end, we
can squeeze the most out of what we have to work with. In particular, I focus on modeling
Ideology, for example, has long been theorized as consisting of multiple dimensions (Conover
and Feldman 1981; Jost, Federico, and Napier 2009). A recent article by Treier and Hillygus
(2009) confirms the idea that ideology falls along two dimensions. Personality research is
another example of a multidimensional construct that has received increased attention in recent
years (Baril and Stone 1984; Eysenck 1990; Gerber et al. 2010; Mondak et al. 2010; Olver and
Mooradian 2003; Tetlock 1981). Indeed, many of the scales and measures political scientists
routinely borrow from social psychology can be thought of as multidimensional latent variables.
In this paper, I will develop a multidimensional item response theory (MIRT) model for
ordered, polychotomous responses. Item response theory is growing in popularity among social
batteries of questions—the type of data routinely collected in the form of sample surveys.
However, many of the existing models for item response data impose restrictive assumptions that
the researcher may not always be comfortable with. Using simulation studies, I demonstrate how
taking advantage of the multidimensional nature of the latent variables one is trying to measure
Item response theory has quickly become almost commonplace in the political science
literature (Delli Carpini and Keeter 1993; Huber and Lapinski 2008; Jacoby 2008; Mondak 2001;
Treier and Jackman 2002; Treier and Simon Jackman 2008). The method was originally
developed in educational testing as a way of measuring latent ability. Rather than just adding up
the total number of correct answers on an exam, item response theory suggests that we can learn
something about the nature of the items and produce better estimates of latent ability by taking
these item-level factors into account. It has since been extended beyond the simple dichotomous
case. The basic model I will use in this paper is derived from the Generalized Partial Credit
Model (GPCM) originally developed by Muraki (1992). I extend the model into multiple
dimensions.
The model assumes that responses to the ordered items are governed by a respondent’s
latent trait, θ, which follows a multivariate normal distribution. Responses are modeled through a
The GPCM models the probability of a respondent (indexed by i) selecting the kth option
Where , and K is the total number of response categories to item j. For the
purposes of my model, gives the coordinates of the individual in the three-dimensional latent
space, and each question, j, is assumed to measure one dimension of that space. The model could
be extended into a compensatory item response model, where responses to any question could be
Brad Jones 3
a function of more than one dimension of (Bolt and Lall 2003), but for my purposes, each item
response model. The describe the thresholds for the ordinal responses. One advantage of the
GPCM over other ordinal response models (the graded response model, for example) is that it
does not put any ordering constraint on the . The summation in the numerator of the model
Bock and Aitkin (1981) describe and EM procedure to estimate models such as the one
above where appears on both the right and left-hand side. Item response models present a
challenge for normal maximum likelihood estimation as the number of parameters grows with
can provide estimates of the parameters to maximize the likelihood of any particular response
pattern by integrating over to produce marginal maximum likelihood estimates. Muraki (1992)
describes how this would work for the generalized partial credit model he developed, and his
solution could be extended to three dimensions by integrating over trivariate normal distribution:
proceed by integrating over and iterating between different values of the parameters at
different points in the three-dimensional space we are integrating across. Leung (2008) describes
uninformative priors, this is analogous to the MML solution. The Bayesian solution has the
Brad Jones 4
advantage of being relatively easier to implement through WinBUGS. It also provides direct
estimates of the and some measure of uncertainty around each estimated parameter (Wollack
et al. 2002). An EM algorithm would require more serious programming to take the necessary
integrals.
Where, is a column vector of zeros and is the variance-covariance matrix. For identification,
the diagonal of is constrained to be unity. There are different options to model the dependence
between the different elements of . One option is to model as if it were drawn from an
inverse Wishart distribution. Alternatively, one could model each correlation separately. I choose
to follow the latter course. We can estimate two of the correlations unconstrained, but the
Where , and
All MCMC simulations were run in WinBUGS via the ―R2WinBUGS‖ package in R. I
used a relatively small number of iterations (300) with a ―burn-in‖ of 50. The simulated dataset
on which all analyses were conducted had a sample size of 500. In its current implementation,
Brad Jones 5
the method takes several hours to implement with a k of 20. The relevant BUGS code is included
in the appendix.
Simulation Studies
simulation studies systematically varying two factors: the αj parameter and the total number of
items in the questionnaire. In this paper, I am primarily interested in the differences between the
results of the one-dimensional approach and a multidimensional approach. Table 1 shows the
different conditions I will be examining. The conditions in the table represent real tradeoffs that
researchers are often forced to make. We often find ourselves in a position where we do not have
ideal measurement instruments. The αj parameter is one way of thinking how well the particular
item measures the latent variable. In an ideal world, we would have unlimited questionnaire
space that we could fill with items that provide precise measurements of the latent variables we
are interested in studying. However, in reality, we often have only very limited space and less-
directly should allow us to borrow information across the different dimensions of the latent
variable to produce more efficient estimates. The method is more computationally intensive, but
if it produces real gains in terms of efficiency, the extra time and resources devoted in the
analysis stage should be able to compensate for unavoidable problems in the design stage.
[Table 1 here]
Figures 1 and 2 show how different values of αj relate the latent variable, θ, to the
probability of giving a particular response in the GPCM. Comparing Figure 1 and Figure 2
reveals how the α parameter affects the probability of giving a particular response over the range
Brad Jones 6
of θ. The β parameters were kept constant. Items with high values of α convey more information
Results
In the analyses that follow, I will use simulated responses to four questionnaires
corresponding to the different columns in Table 1. The latent variable follows the multivariate
normal distribution:
The true values of the parameters were fixed, and responses follow the GPCM.
By examining the raw correlation between the predicted latent variable and the true
values from the simulation, we can begin to get an idea the advantages of multidimensional item
response models. Table 2 shows the Pearson’s correlation between the predictions from the
estimation and the true values by the factors listed in Table 1. The table produces many relevant
comparisons. For my purposes, I am most interested in comparing the simultaneous and separate
estimations. A naïve estimate (an additive index of responses) is included for comparison.
[Table 2 here]
In every case, the simultaneous estimation outperforms the separate estimation. In some
situations, the improvement is dramatic. However, the naïve estimate performs remarkably well.
Indeed as α or k increases, the correlation between the true values of the latent variable and the
naïve estimator rival the most computationally intense simultaneous estimation. In most cases,
the naïve estimator actually outperforms the separate estimation of the latent variables. This
initial result seems disheartening on its face. If we can get a reasonably good approximation of
Brad Jones 7
the latent construct by simply summing the responses to the questionnaire, why bother with
difficult estimation?
shows is in modeling the structure of the latent variable. Table 3 shows the estimates of the
correlations between each element of the latent variable. The true correlations are listed across
[Table 3 here]
Comparing the naïve estimate and the separate estimation in Table 3 reveals a similar story as
was told in Table 2. In nearly every case, the naïve estimate significantly outperforms the
separate estimation when it comes to modeling the dependence between the elements of the
latent variable. However, by estimating the correlations directly in the simultaneous estimation,
we see marked improvement. Even in the worst case where k is low and α is low, the estimated
correlations are remarkably close to the actual correlations. Indeed, in this sample, increasing the
values of k and α did not lead to dramatically better estimates of the correlations.
In most cases, we are interested in modeling latent constructs with the ultimate aim to
include them on the right-hand side of a regression equation. Using the true values of the
parameters, I created a simple model with a continuous dependent variable, which follows the
where and
Table 4 shows the resultant estimates of the obtained from the different modeling strategies.
The row labeled ―True Values‖ contains the estimated coefficients when we use the true values
Brad Jones 8
of the latent variables in the regression. The other cell entries are the coefficients obtained when
[Table 4 here]
Comparing the results of Table 4 to those of Table 2 reveals just how important it is to
account for the correlations between the different dimensions of the latent variables. Although
the naïve and separate strategies produced reasonable approximations of the latent variables, they
lead to regression coefficients that are substantially smaller (in absolute terms) than the true
values. Even with a small value for k composed of questions that only poorly measure the latent
variable, the simultaneous estimation produces estimates that are reasonably close to the true
values.
theory model for ordinal responses. I have shown that modeling the multidimensional nature of
the latent variable directly can add significantly to the quality of the final estimates. This is
especially evident when using the estimates obtained from the model as independent variables in
subsequent analyses.
There were several possibilities that I did not test directly that might make for interesting
future research. In all of my analyses, I assumed knowledge of the model that generated the data.
It would be worth investigating how violations of the modeling assumption affect subsequent
stages of the analysis. The GPCM is fairly flexible and seems a reasonable choice for most
ordered response data, but one can definitely imagine circumstances where a different model
Another interesting avenue of investigation might be to vary the correlations between the
dimensions of the latent variables. I held that aspect of the data constant in my analyses here, but
it would be interesting to know how increasing or decreasing the correlations effects the other
parts of estimation. I would suspect that the multidimensional approach will show the most
It would also be interesting to discover how exploiting the multidimensional nature of the
latent variable might be able to compensate for poor measures across one or more dimensions. I
varied all values of α simultaneously but there are several situations where we might have
excellent measures of one dimension and poor measures of another. As I showed here, the MIRT
approach produces valid estimates of the correlation structure of the latent variable even when
we have only a few, imprecise measurements (see Table 3). The MIRT method described in this
paper allows us to borrow strength from the other dimensions of the latent variable in the
estimation.
Multidimensional Item Response Theory models hold a great deal of promise for social
scientists. In this paper, I have shown through simulation studies how they might allow us to
transform a small number of noisy measures into estimates that we can have a great deal of
confidence in.
Brad Jones 10
Baril, Galen L., and William F. Stone. 1984. ―Mixed Messages: Verbal-Vocal Discrepancy in
Freshman Legislators.‖ Political Psychology 5(1): 83-98.
Bock, R. Darrell, and Murray Aitkin. 1981. ―Marginal maximum likelihood estimation of item
parameters: Application of an EM algorithm.‖ Psychometrika 46(4): 443-459.
Bolt, Daniel M., and Venessa F. Lall. 2003. ―Estimation of Compensatory and Noncompensatory
Multidimensional Item Response Models Using Markov Chain Monte Carlo.‖ Applied
Psychological Measurement 27(6): 395 -414.
Conover, Pamela Johnston, and Stanley Feldman. 1981. ―The Origins and Meaning of
Liberal/Conservative Self-Identifications.‖ American Journal of Political Science 25(4):
617-645.
Curtis, S. McKay. 2010. ―BUGS Code for Item Response Theory.‖ Journal of Statistical
Software 36: 1–34.
Delli Carpini, Michael X. Delli, and Scott Keeter. 1993. ―Measuring Political Knowledge:
Putting First Things First.‖ American Journal of Political Science 37(4): 1179-1206.
Gerber, Alan S. et al. 2010. ―Personality and Political Attitudes: Relationships Across Issue
Domains and Political Contexts.‖ American Political Science Review 104(01): 111-133.
Huber, Gregory A., and John S. Lapinski. 2008. ―Testing the Implicit-Explicit Model of
Racialized Political Communication.‖ Perspectives on Politics 6(01): 125-134.
Jost, John T, Christopher M Federico, and Jaime L Napier. 2009. ―Political ideology: its
structure, functions, and elective affinities.‖ Annual Review of Psychology 60: 307-337.
Leung, Shing-On. 2008. ―A Three-Dimensional Latent Variable Model for Attitude Scales.‖
Sociological Methods & Research 37(1): 135 -154.
Mondak, Jeffrey J. 2001. ―Developing Valid Knowledge Scales.‖ American Journal of Political
Science 45(1): 224-238.
Mondak, Jeffrey J. et al. 2010. ―Personality and Civic Engagement: An Integrative Framework
for the Study of Trait Effects on Political Behavior.‖ American Political Science Review
104(01): 85-110.
Olver, James M., and Todd A. Mooradian. 2003. ―Personality traits and personal values: a
conceptual and empirical integration.‖ Personality and Individual Differences 35(1): 109-
125.
Tetlock, Philip E. 1981. ―Personality and isolationism: Content analysis of senatorial speeches.‖
Journal of Personality and Social Psychology 41(4): 737-743.
Treier, Shawn, and D. Sunshine Hillygus. 2009. ―The Nature of Political Ideology in the
Contemporary Electorate.‖ Public Opinion Quarterly 73(4): 679-703.
Treier, Shawn, and Simon Jackman. 2002. ―Beyond Factor Analysis: Modern Tools for Social
Measurement.‖ In Chicago, IL.
Treier, Shawn, and Simon Jackman. 2008. ―Democracy as a Latent Variable.‖ American Journal
of Political Science 52(1): 201-217.
Van Der Ark, L. Andries. 2001. ―Relationships and Properties of Polytomous Item Response
Theory Models.‖ Applied Psychological Measurement 25(3): 273 -282.
Wollack, James A. et al. 2002. ―Recovery of Item Parameters in the Nominal Response Model:
A Comparison of Marginal Maximum Likelihood Estimation and Markov Chain Monte
Carlo Estimation.‖ Applied Psychological Measurement 26(3): 339 -352.
Brad Jones 12
= .5
.4
.3
P(Response)
.2
.1
0
-4 -2 0 2 4
P(Y = 1) P(Y = 2)
P(Y = 3) P(Y = 4)
P(Y = 5) P(Y = 6)
Figure 1: Low α
Brad Jones 13
= 1.5
.6
P(Response)
.4
.2
0
-4 -2 0 2 4
P(Y = 1) P(Y = 2)
P(Y = 3) P(Y = 4)
P(Y = 5) P(Y = 6)
Figure 2: High α
Table 2
Table 3
note: cell entries are correlations between the estimated factors for the naïve and
separate estimations – the entries for the simultaneous estimation are the estimates
on the hyperparameters
Table 4
Appendix
My code was adapted from Curtis (2010, 10). To estimate the model, I ran the following BUGS code:
model{
for (i in 1:n) {
for (j in 1:p) {
Y[i, j] ~ dcat(prob[i, j, 1:6])
}
#This code ensures that the theta will have the desired correlation structure
theta1[i] ~ dnorm(0.0, 1.0)
rand2[i] ~ dnorm(0.0, 1.0)
theta2[i] <- theta1[i]*rho12 + rand2[i]*temp2
rand3[i] ~ dnorm(0.0, 1.0)
theta3[i] <- theta1[i]*rho13 + rand2[i]*temp3 + rand3[i]*pow((1-rho13*rho13-
temp3*temp3),.5)
}
for (i in 1:n) {
for (j in 1:p) {
for (k in 1:6) {
#This breaks down the model into manageable pieces
#The d1, d2, and d3 variables are row vectors of 0s and 1s to indicate which questions
#fall on which dimension
eta[i, j, k] <-alpha[j]*(theta1[i]*d1[j]+theta2[i]*d2[j]
+theta3[i]*d3[j]-beta[j,k])
psum[i, j, k] <- sum(eta[i, j, 1:k])
epsum[i, j, k] <- exp(psum[i,j,k])
prob[i,j,k] <- epsum[i,j,k]/sum(epsum[i,j,1:6])
}
}
}
for (j in 1:p) {
alpha[j] ~ dnorm(m.alpha, pr.alpha) I(0,)
beta[j,1] <- 0.0
for (k in 2:6) {
beta[j, k] ~ dnorm(m.beta, pr.beta)
}
}
pr.alpha <- pow(s.alpha, -2)
pr.beta <- pow(s.beta, -2)
rho12 ~ dunif(-1,1)
rho13 ~ dunif(-1,1)
#this code ensures that the correlation matrix will be positive definite
rho23.star ~ dunif(0,1)
min <- pow(((1-rho12*rho12)*(1-rho13*rho13)), .5)*-1 + rho12*rho13
max <- pow(((1-rho12*rho12)*(1-rho13*rho13)), .5) + rho12*rho13
range <- max - min
rho23 <- rho23.star*range+min
temp2 <- pow((1-rho12*rho12),.5)
temp3 <- (rho23-rho12*rho13)/temp2
}