You are on page 1of 24

This article was downloaded by: [University of California Santa Cruz]

On: 03 April 2015, At: 17:19


Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Structural Equation Modeling: A


Multidisciplinary Journal
Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/hsem20

Bayesian Data-Model Fit Assessment for


Structural Equation Modeling
a
Roy Levy
a
School of Social and Family Dynamics, Arizona State University
Published online: 17 Oct 2011.

To cite this article: Roy Levy (2011) Bayesian Data-Model Fit Assessment for Structural Equation
Modeling, Structural Equation Modeling: A Multidisciplinary Journal, 18:4, 663-685, DOI:
10.1080/10705511.2011.607723

To link to this article: http://dx.doi.org/10.1080/10705511.2011.607723

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the
“Content”) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or
howsoever caused arising directly or indirectly in connection with, in relation to or arising
out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-
and-conditions
Structural Equation Modeling, 18:663–685, 2011
Copyright © Taylor & Francis Group, LLC
ISSN: 1070-5511 print/1532-8007 online
DOI: 10.1080/10705511.2011.607723

TEACHER’S CORNER
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

Bayesian Data-Model Fit Assessment


for Structural Equation Modeling

Roy Levy
School of Social and Family Dynamics, Arizona State University

Bayesian approaches to modeling are receiving an increasing amount of attention in the areas of
model construction and estimation in factor analysis, structural equation modeling (SEM), and
related latent variable models. However, model diagnostics and model criticism remain relatively
understudied aspects of Bayesian SEM. This article describes and illustrates key features of
Bayesian approaches to model diagnostics and assessing data–model fit of structural equation
models, discussing their merits relative to traditional procedures.

Keywords: Bayesian model checking, data–model fit, structural equation modeling

Bayesian modeling and estimation strategies are receiving an increasing amount of attention in
structural equation modeling (SEM) as viable if not preferable alternatives to more traditional
modeling approaches, particularly in complex situations (e.g., Ansari, Jedidi, & Dube, 2002;
Arminger & Muthén, 1998; Lee, 2007; Lee & Song, 2004; Lee, Song, & Tang, 2007; Lee &
Tang, 2006; Lee & Zhu, 2000; Scheines, Hoijtink, & Boomsma, 1999; Song & Lee, 2002;
Song, Lee, & Hser, 2009). The availability of software for conducting Bayesian estimation
via packages for SEM and related models (Arbuckle, 2006; L. K. Muthén & Muthén, 1998–
2010) and general-purpose software (Spiegelhalter, Thomas, Best, & Lunn, 2007), and the
publication of example code for such software (Lee, 2007) is likely to contribute to the growth
of applications of Bayesian approaches to SEM (Rupp, Dey, & Zumbo, 2004). The emergence of
Bayesian approaches to modeling and estimation in SEM carries with it the need for conducting
data–model fit analyses in these arenas. However, model diagnostics and model criticism

Correspondence should be addressed to Roy Levy, School of Social and Family Dynamics, P.O. Box 873701,
Arizona State University, Tempe, AZ 85287-3701, USA. E-mail: Roy.Levy@asu.edu

663
664 LEVY

remain relatively underdeveloped aspects of Bayesian approaches to SEM. This article describes
Bayesian approaches to investigating data–model fit in SEM and illustrates them via an example
employing a simple factor model. Importantly, the methods discussed apply in essentially the
same manner to more complex models; indeed, included in the strengths of several of the
approaches is the wide applicability across a variety of modeling scenarios. The methods
discussed target the evaluation of a single model; we do not discuss Bayesian procedures
for conducting comparisons of multiple models (see, e.g., Gelfand, 1996; Spiegelhalter, Best,
Carlin, & van der Linde, 2002).
This article is intended to introduce readers to key ideas about and the relative strengths and
weaknesses of procedures for conducting data–model fit analyses in a Bayesian framework.
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

Code for all the analyses in the illustration can be downloaded from https://sites.google.com/
a/asu.edu/roylevy/papers-software. It is assumed that readers are familiar with principles of
Bayesian inference (briefly reviewed here). Accessible introductions can be found in Gelman,
Carlin, Stern, and Rubin (1995), Gill (2007), and Lynch (2007). A thorough treatment of
Bayesian SEM can be found in Lee (2007). This article is organized as follows. In the next
section, a Bayesian approach to factor analytic models is described, followed by the presentation
and discussion of several key components and paradigmatic approaches to Bayesian data–model
fit. The following sections describe the data and context of the example and present and discuss
the results of the example. A discussion of the approaches concludes the article.

BAYESIAN FACTOR ANALYSIS VIA SEM

In this section, basic principles and common practices of Bayesian analyses of factor models
and SEM are reviewed. In following subsections I present the basic factor model, briefly
review principles of Bayesian analysis in the context of this model, and then give an overview
of common estimation strategies. The reader can find further details and alternatives on model
formulation and estimation for Bayesian SEM in the referenced works (e.g., Arminger &
Muthén, 1998; Lee, 2007; Rowe, 2003; Scheines et al., 1999).

Model Formulation
Let X be an .N  J / matrix containing the potentially observable values from N subjects to
J observable variables related to a set of latent variables via a factor analytic measurement
model

Xi D £ C Ÿ i ƒ0 C • i ; (1)

where Xi D .Xi1 ; : : : ; XiJ ) is the i th row of X, £ D .£1 ; : : : ; £J / is a .1  J / vector of


intercepts, Ÿ D .Ÿi1 ; : : : ; ŸiM / is the .1  M / vector of latent variables for subject i , ƒ is a
.J  M / matrix of loadings, and • i  N.0; ‚/ is a .1  J / vector of errors, where ‚ is
diagonal. The full collection of subjects’ latent variables is given by „ D .Ÿ 1 ; : : : ; ŸN /, which
has mean vector › and covariance matrix ˆ. Finally, let  D .„; ›; ˆ; £; ƒ; ‚/ denote the
full collection of all unknown entities in the model.
BAYES MODEL FIT 665

Bayesian Analysis
In a fully Bayesian analysis, all unknown quantities are treated as random.1 Let P ./ denote the
prior distribution of the unknown parameters; that is, a distribution chosen by the researchers
to reflect substantive beliefs based on expert expectation or prior research. The posterior
distribution of the unknown parameters, denoted P .jXobs / where Xobs are the observed
values for the observable variables, constitutes the solution from fitting the model to the data
and can be viewed as the result of updating the prior distribution in light of the information in
the data. Bayes’s theorem states that

P .Xobs j/P ./ P .Xobs j/P ./


Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

P .jXobs / D D Z / P .Xobs j/P ./: (2)


P .Xobs / obs
P .X j/P ./d 


The first term in the numerator on the right side of Equation 2 is the likelihood. Given the
preceding description of the factor analytic model, this has the usual form of normal-theory
factor analysis. In a fully Bayesian analysis, all unknown entities are treated as random variables
and assigned prior distributions that make up the second term on the right side of the numerator
in Equation 2. In the work reported here, I specify normal prior distributions for the intercepts,
factor means, and loadings, and inverse-gamma distributions for the variances (Arminger &
Muthén, 1998; Lee, 2007; Rowe, 2003). Details on the specific choices for the prior distributions
are contained in the Appendix. Alternative choices for prior distributions have been proposed
(e.g., see Gelman, 2006, for discussions and recommendations on alternative choices for priors
on variance parameters in a variety of scenarios). The denominator in Equation 2 is the marginal
distribution of the observed data, which ensures that the posterior distribution integrates to 1.
Dropping the denominator on the right side of Equation 2 reveals that the posterior distribution
is proportional to the product of the likelihood and the prior.
As in traditional modeling strategies, identification of the location and scale of the latent
variables can be imposed by fixing values of parameters. Likewise, restrictions on parameters
based on theory (e.g., simple structure, parameter invariance, linear growth) can be imposed. I
do not delve further into issues surrounding the choices of priors or other restrictions because,
importantly, the data–model fit procedures that are the focus of this article are not limited to
specific choices for prior distributions, identifying restrictions or functional forms.

Estimating Posterior Distributions


Markov chain Monte Carlo (MCMC; Gilks, Richardson, & Spiegelhalter, 1996) estimation
routines capitalize on the proportionality relation in Equation 2 to empirically approximate the
posterior distribution and provide a flexible framework for estimation in Bayesian analyses
(Gelman et al., 1995; Gilks et al., 1996; Gill, 2007; Lynch, 2007). See Arminger and Muthén
(1998), Rowe (2003), and Lee (2007) for examples of complete derivations of MCMC sampling
schemes for SEM. For the current purposes, is sufficient to note that, following a burn-in

1
In what has frequently been termed empirical Bayes estimation, certain unknown entities (i.e., those at the highest
level of a hierarchical specification) are not assigned prior distributions, and instead are estimated using frequentist
strategies (see, e.g., Gill, 2007, p. 425).
666 LEVY

period of draws discarded prior to convergence, MCMC can be employed to obtain K draws
.1/jXobs , .2/ jXobs ; : : : ; .K/ jXobs that form an empirical approximation that is in the limit
equal to sampling from the posterior distribution P .jXobs /. Figure 1 contains a schematic for
the process of empirically approximating the posterior distribution (and conducting data–model
fit assessment as discussed later) in a simulation environment. On the left side of each panel,
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

(a)

(b)

FIGURE 1 Schematic for conducting estimation and posterior predictive model checking in a simulation
environment. Elements in the solid box estimate the posterior distribution; elements in the dashed box are the
observed value of the test statistic (a) or estimate the distribution of realized values of the discrepancy measure
(b); elements in the dotted box estimate the posterior predictive distribution; elements in the double-lined box
estimate the posterior predictive distribution of the test statistic (a) or discrepancy measure (b). (a) PPMC for
test statistics. (b) PPMC for test statistics.
BAYES MODEL FIT 667

the prior distribution P ./ combines with Xobs to yield the posterior distribution P .jXobs /;
K draws from the posterior distribution (in the solid box) are obtained to approximate the
posterior. (The remaining aspects of Figure 1 are discussed as aspects of the data–model fit
procedures that are introduced in the following sections.)

BAYESIAN DATA–MODEL FIT METHODS

For purposes of exposition, I characterize two separate flows of argument for data–model fit.
The distinction between the two is somewhat artificial, in that they share much in common
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

theoretically and computationally. As discussed next, however, certain procedures can be


employed with one but not both approaches.

Bayesian Data–Model Fit Methods for Test Statistics


One approach to data model–fit assessment focuses on the extent to which the model recovers
or predicts features of the data, and thereby treats characteristics of the data (only). Briefly,
this approach seeks to answer the question “How well does the model recover or account for
this characteristic of the data?” The analysis proceeds answering the following questions:

What is the characteristic or feature of the data of interest? In the Bayesian data–
model fit literature, a quantity summarizing a characteristic of the data is referred to as a test
statistic, and is denoted by T .X/ to denote it is a function of the data only. The value of the
test statistic using the observed data is denoted T .Xobs /. In the illustration, we examine the
mean and variance for each variable, as well as the covariance for each pair of variables.

Given the model, what would be expected for this characteristic or feature of the
data? That is, what does the model imply the characteristic or feature to be? This is often
represented via a reference distribution. In frequentist approaches, the reference distribution is
the traditional sampling distribution. Many of the Bayesian procedures discussed here could be
characterized as representing different options for the choice of a reference distribution; that
is, what distribution of X is used on which to evaluate T .X/ to facilitate the comparison of
T .Xobs /. All of the procedures view X as conditional on  and produce a distribution of X
by marginalizing that conditional distribution over a distribution of . That is, the reference
distribution is
Z
P .Xrep j    / D P .Xj/P .j    /d ; (3)


where the use of    after the conditioning bar indicates the distribution is possibly conditional
on other terms. As discussed later, the procedures can be viewed as differing in their choice of
what to condition on when choosing P .j    /. Conceptually, the first term on the right side of
Equation 3 can be viewed as analogous to the frequentist notion of a sampling distribution (i.e.,
the sampling distribution of the data given the values of the model parameters). The second
term on the right side reflects the Bayesian aspect, namely, by viewing the values of the model
parameters in terms of a distribution, rather than in terms of a point (estimate).
668 LEVY

In simulation environments, the reference distribution is empirically approximated by taking


some number K draws from the distribution, .1/; : : : ; .K/ where each .k/  P .j    /. To
empirically approximate the reference distribution, each of the K draws for  are used in turn
to generate a replicated data set via the model, Xrep.k/  P .Xj D .k/ /. The test statistic is
then evaluated in each of these K replicated data sets to form the predictive distribution of the
test statistic, T .Xrep.1//; : : : ; T .Xrep.K/ /.

How does the characteristic or feature of the data compare to what the model
implies the characteristic or feature ought to be? The analysis proceeds by comparing
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

the observed value T .Xobs / to the predictive distribution T .Xrep.1//; : : : ; T .Xrep.K/ /. Meaningful
differences between the observed value T .Xobs / and the distribution of the predicted values
T .Xrep.1//; : : : ; T .Xrep.K/ / are indicative of data–model misfit. As illustrated later, this can
be done graphically or via a p value, where p D P .T .Xrep /  T .Xobs //. This p value
is empirically approximated by evaluating the proportion of T .Xrep.1//; : : : ; T .Xrep.K/ / that
exceeds T .Xobs /.
The following sections describe variations on this theme in terms of the choice of the
distribution for  in Equation 3 and the associated procedures. Greater emphasis is placed on
the procedures that have received the most attention or offer the most potential in SEM.

Posterior predictive model checking. Perhaps the most popular procedures for model
criticism in Bayesian analyses involve posterior predictive model checking (PPMC; Gelman,
Meng, & Stern, 1996; Meng, 1994; Rubin, 1984; see Guttman, 1967, for historical roots).
PPMC is the only method that has attracted attention in SEM (Lee, 2007; Scheines et al.,
1999) and related modeling approaches (Levy, Mislevy, & Sinharay, 2009; Sinharay, Johnson,
& Stern, 2006; Sinharay & Stern, 2003). As such, its features are discussed in some detail.
PPMC uses the posterior distribution P .jXobs / as the distribution for  in Equation 3 to
generate posterior predicted data as the reference distribution
Z
P .Xpostpred jXobs / D P .Xj/P .jXobs /d : (4)


Conceptually, this is a reference distribution based on the model after the data have been
incorporated. The resulting p value is referred to as the posterior predictive p value (ppost).

ppost D P .T .Xpostpred jXobs /T .Xobs //: (5)

Computationally, PPMC picks up on the use of MCMC to fit the model, as illustrated in
Figure 1a. For each of the K draws from the posterior distribution, a replicated data set is
generated through the model. For the kth draw from the posterior, Xpostpred.k/  P .Xj D
.k/ jXobs /, yielding the collection Xpostpred.1/; : : : ; Xpostpred.K/ as an empirical approximation
to the posterior predictive distribution, as illustrated in the dotted box in Figure 1. The test
statistic is calculated for each of these data sets (the double lined box), and ppost is estimated
as the proportion of the K draws in which the value in the posterior predicted data exceeds
the observed value T .Xobs / (in the dashed box in Figure 1a).
BAYES MODEL FIT 669

There are two dominant perspectives on the interpretation of ppost. A hypothesis testing
perspective seeks to interpret ppost in the usual associated manner, where an upper-tailed (two-
tailed) test with significance level ’ is performed by rejecting the null hypothesis of data–model
fit if ppost is less than ’ (or more extreme than ’=2 or 1 ’=2 in a two-tailed test). Operating
from this perspective, Robins, van der Vaart, and Ventura (2000) examined the theoretical
properties under null conditions (Dahl, 2006) and showed that employing ppost to conduct
hypothesis testing could lead to conservative inferences; that is, with Type I error rates below
nominal ’ values. A number of the alternative procedures discussed in this article can be
viewed as attempts to overcome this limitation. Another perspective treats PPMC as diagnostic
in nature, in which the goal is to ascertain strengths and weaknesses of the model rather than
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

whether the model fits (Gelman, 2003). Such a perspective follows Box’s famed dictum that
all models are wrong and the more relevant issue is to characterize in what ways the model is
(and is not) wrong and the implications of using such a model (Box & Draper, 1987; Gelman,
2007). The results of PPMC are then viewed as diagnostic pieces of evidence for, rather than
a hypothesis test of, data–model (mis)fit (Gelman, 2003, 2007; Gelman et al., 1996; Gill,
2007; Stern, 2000). From this perspective, graphical representations are often employed (e.g.,
Gelman, 2004) as summaries of the results, and ppost is simply a way to summarize the results
numerically, and has little do with the probability of rejecting a model in the already known to
be false situation where the model is correct. Values close to 0 (or 1) constitute evidence that
the model underpredicts (overpredicts) the characteristics captured by the discrepancy measure.
For discussions on the possible interpretations of ppost, the interested reader is referred to Meng
(1994), Rubin (1996), Bayarri and Berger (2000a, 2000b), Robins et al. (2000), Stern (2000),
Berkhof, van Mechelen, and Gelman (2004), Gelman (2003, 2007), and Johnson (2007).

Prior predictive model checking. The earliest of the formalized Bayesian methods is
prior predictive model checking (Box, 1980), which uses the prior distribution for  in Equation
3 to generate the predictive distribution. In practice, Monte Carlo techniques are utilized to
draw values of  from the prior distribution. For each draw, a replicated data set is generated
via the model and the drawn value for , yielding the prior predictive distribution of the data.
The prior predictive method has advantages in that it is conceptually simple, does not require
heavy computation, and does not require the model to be fit to the current data set. However,
the prior predictive method suffers in that it is completely dependent on the choice of the prior.
Because it uses the prior distribution for  to generate the predictive distribution, the method
essentially pursues whether the prior is an adequate representation of the data. In cases where
the prior is purposefully constructed to reflect substantive theory to be assessed, this very well
might be appropriate. However, analysts are typically interested in whether the posterior is an
adequate representation of the data. Thus prior predictive model checking misses the mark;
this is especially so when the analyst employs quite overly diffuse priors in an effort to have
the data more fully determine the posterior. I do not discuss prior predictive model checking
further, although the prior predictive distribution is employed to obtain calibrated posterior
predictive p values, discussed next.

Calibrated ppost values. In response to the hypothesis-testing-oriented criticism of ppost,


Hjort, Dahl, and Steinbakk (2006) introduced a method of calibrating ppost values so that the
670 LEVY

resulting quantity will yield Type I error rates at the nominal ’ level. The calibrated ppost value
is
pcpost D P .ppost .X/  ppost.Xobs //; (6)

where the distribution of X is given by the prior predictive distribution. Conceptually, this
approach refers to the value of ppost from the real data to values of ppost that would be
observed under the null condition that the prior distribution is correct. Although conceptually
simple, obtaining pcpost can be extremely computationally intensive as it involves a double-
simulation approach. First, the value of ppost for Xobs can be obtained as previously described.
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

Second, some large number L draws from the prior predictive distribution can be obtained
using the prior distribution for  in Equation 3 as discussed in the previous section. PPMC is
then conducted, treating each of these L prior predicted data sets as the observed data, and a
ppost value is obtained L times. The pcpost value is then estimated as the proportion of these L
values that are less than or equal to ppost based on Xobs . The entire process requires obtaining
a total of .L C 1/K draws from the posterior, which can be computationally demanding. (The
process also includes the generation of L prior predictive data sets, which usually takes only
a trivial amount of time.)

Partial posterior predictive values. Bayarri and Berger (2000a) introduced the partial
posterior predictive p value, pppost, which uses the partial posterior distribution for  in
Equation 3:
Z
P .Xpartpostpred jXobs / D P .Xj/P .jXobs nT .Xobs //d : (7)


This differs from PPMC in that it does not employ the posterior distribution of the model
parameters as the distribution for , but instead employs the partial posterior distribution
P .jXobs nT .Xobs // where the use of the n indicates that the distribution is partialing out
T .Xobs /. Conceptually, this approach employs a distribution for  that results after the data
have been accounted for in terms of T .Xobs /. The partial predictive distribution can therefore
be thought of as the (posterior) distribution of  based on the information available in Xobs
above and beyond what is in captured by T .Xobs /. The partial posterior predictive p value is
defined as
pppost D P .T .Xpartpostpred / D T .Xobs //: (8)

To compute the partial posterior predictive p value in a simulation environment such as MCMC,
one must first generate values from the partial posterior distribution P .jXobs nT .Xobs // and
then use those values to simulate data through the model P .Xj/, akin to what is done in
PPMC. The distinction is that the analyst must employ the partial posterior distribution in the
first step. Draws from the partial posterior can be obtained via the construction of a Metropolis
chain (Gilks et al., 1996); see Bayarri and Berger (2000a) for details.
pppost is advantageous in that, unlike ppost, it will maintain Type I error rates in the classical
sense (Bayarri & Berger, 2000a; Robins et al., 2000). However pppost was developed for test
statistics and is not applicable to the more general class of discrepancy measures (discussed
later) that employ the model parameters in their calculation (Berkhof et al., 2004), including
BAYES MODEL FIT 671

many measures that are popular in SEM. Furthermore, pppost does not support the use of
diagnostic approaches such as residuals based on posterior predictive checks, nor does it support
graphical representations of results (Bayarri & Berger, 2000b; Stern, 2000). In addition, the
mechanism to obtain draws from the partial predictive distribution relies on a Metropolis
sampler after draws from the posterior distribution have been obtained, further adding to the
computational burden.
Bayarri and Berger (2000a) also proposed a conditional posterior predictive p value. This
is more difficult to compute than pppost and is equivalent to pppost in many cases, including
linear models assuming normality frequently employed in SEM. For these reasons, it is not
recommended for general usage and is not discussed in detail here.
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

Cross-validated posterior predictive model checking. In a cross-validation approach


to Bayesian model checking, the adequacy of a model is checked not against the data employed
in fitting the model, but in terms of predicting new data (Evans, 1997, 2000). As a modification
of PPMC using well-known principles of cross-validation (Cudeck & Browne, 1983) we only
briefly discuss this approach. To begin, Xobs is split such that a subset of the observations,
denoted Xobs.r /, is set aside and the remaining observations, denoted Xobs. r / are used to
estimate the posterior distribution (e.g., via MCMC), which is then used as the distribution for 
in Equation 3 to construct the predictive distribution, similar to PPMC. The posterior predictive
distribution for the test statistic is based on this predictive distribution and the observed value
is computed using the previously set aside observations Xobs.r /. A cross-validated p value can
then be calculated analogously to Equation 5.
The cross-validated approach has advantages in separating the data used to estimate the
posterior distribution from the data used to evaluate the model, resolving the difficulties
encountered with classical interpretations of ppost (Evans, 2000). A natural choice for the
division of data is a random split in half. The cross-validated approach suffers in that the
posterior distribution is based on only a subset of the data Xobs. r / and different splits of the
data could lead to substantively different conclusions.

Bayesian Data-Model Fit Methods for Discrepancy Measures


The second approach to data–model fit assessment departs from the first by explicitly building
in the comparison between the observed data and model’s implications from the outset. Whereas
the first approach focuses on the characteristics of the data, here the focus is on the discrepancy
between the data and model. The analysis proceeds by answering the following questions:

How discrepant are the data and the model, as summarized by some quantity? In
the Bayesian data–model fit literature, such a quantity is referred to as a discrepancy measure,
and is denoted by D.XI / to reflect that it might be a function of the data and the model
parameters.
This work examines the following discrepancy measures. To target the discrepancy between
observed and model-implied pairwise relationships, we consider the model-based correlation
672 LEVY

between variables Xj and Xj 0 ,


N
X
.Xij E.Xij j//.Xij 0 E.Xij 0 j//
i D1
MB C orjj 0 D v v ; (9)
u N u N
uX uX
t .X 2
E.Xij j// t .Xij 0 E.Xij 0 j//2
ij
i D1 i D1

where E.Xij j/ is the model-implied expectation for the value from subject i for variable j .
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

Given the assumptions of the factor analytic model, E.Xij j/ D £j C Ÿi œ0j .
As a measure of the overall fit of the covariance structure of the model, we employ the
popular likelihood ratio (LR) fit function (e.g., Scheines et al., 1999),

LR D .N 1/j†./j C tr.S†./ 1 / log jSj J; (10)

where S is the usual sample covariance matrix and †./ is the model-implied covariance
matrix. We also consider the standardized root mean square residual (SRMR),
v
u J J
u XX
u2 Œ.sjj 0 ¢./jj 0 /=.sjj sj 0 j 0 /2
u
u
t j D1 j 0 D1
SRMR D ; (11)
J.J C 1/

where sjj 0 and ¢./jj 0 are the elements in row j and column j 0 of S and †./, respectively.
When evaluated using the observed data, the values of a discrepancy measure are referred
to as the realized values. In frequentist approaches, there is only one realized value of the
discrepancy measure, calculated by evaluating D.Xobs I /, using a point estimate in place
of . This can similarly be conducted in a Bayesian analysis by treating a summary of the
posterior distribution as a point estimate to be used in evaluating a discrepancy measure. For
example, Lee (2007) illustrated residual analyses in SEM by using the posterior mean of  as
a point estimate to facilitate the computation of residuals. These procedures are not discussed
in detail here, as the focus of this work is on explicating paradigmatic Bayesian approaches to
data–model fit, which are characterized by the use of a distribution for .
When a distribution for  is employed, we obtain a distribution of realized values. For
PPMC, the posterior distribution is used. The posterior distribution of realized values is obtained
by evaluating the discrepancy measure (using the observed data) in the posterior distribution.
More formally the posterior distribution for a discrepancy measure is given by
Z
P .D.Xobs I /jXobs / D D.Xobs I /P .jXobs /d : (12)


This posterior distribution can be estimated in a simulation environment such as MCMC by eval-
uating D.Xobs I / using the draws from the posterior distribution, .1/jXobs ; : : : ; .K/ jXobs .
The collection D.Xobs I .1/ jXobs /; : : : ; D.Xobs I .K/ jXobs / constitutes an empirical approxi-
mation to the posterior distribution of the discrepancy measure, and are referred to as the
BAYES MODEL FIT 673

realized values of the discrepancy measure. This is illustrated in Figure 1b, where the draws
from the posterior distribution (the solid box) are used with Xobs to obtain the realized values
in the dashed box. Note that whereas with test statistics there is a single observed value,
discrepancy measures have a distribution of realized values based on the (posterior) distribution
of the parameters.
The use of posterior distributions of discrepancy measures allows for point and interval
estimates of the discrepancy measures, as well as direct probabilistic statements regarding
these quantities. In some cases, the discrepancy measure carries with it natural interpretations
(e.g., MB-Cor, SRMR) and can be used directly to facilitate inference. For other discrepancy
measures, the posterior distribution stands in need of a referent. In these situations, the analysis
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

proceeds to the following steps, mimicking those discussed previously for test statistics.

Given the model, what would be expected for this discrepancy measure? That is,
conditional on the model, what do we expect the values of the discrepancy measures to be?
As with test statistics, this is often represented via a reference distribution, which is generated
by modeling the replicated data as conditional on  and choosing a distribution of  to
marginalize over (see Equation 3). Although there are a number of choices for the reference
distribution for test statistics, the only method explicitly developed for the more general class
of discrepancy measures is PPMC (and its extensions, e.g., obtaining the calibrated p value,
pcpost , or the cross-validation approach), which uses the posterior distribution. The posterior
distribution is therefore used in the calculation of the predictive values of the discrepancy
measure. As discussed earlier, the reference distribution is Xpostpred.k/  P .Xj D .k/ jXobs /.
These posterior predicted data sets are then used in combination with the associated draws from
the posterior distribution in the computation of the discrepancy measure to yield D.Xpostpred.1/;
.1/ jXobs /; : : : ; D.Xpostpred.K/ ; .K/ jXobs / that form an empirical approximation to the poste-
rior predictive distribution of D.XI /, as illustrated in the double-lined box in the Figure 1b.
The other methods discussed earlier could, in principle, be utilized with discrepancy measures.
However, this has not been formally developed for partial predictive model checking or
conditional predictive model checking and therefore it is not clear that the properties of pppost
for test statistics generalize to discrepancy measures.

How does the realized discrepancy compare to what the model implies the dis-
crepancy ought to be? The analysis proceeds by comparing the realized values D.Xobs I
.1/ jXobs /; : : : ; D.Xobs I .K/ jXobs / to the predictive distribution D.Xpostpred.1/I .1/ jXobs /; : : : ;
D.Xpostpred.K/ I .K/ jXobs ). This can be done graphically or via a p value, where ppost D
P .D.Xpostpred I jXobs /  D.Xobs I jXobs //. In a simulation environment such as MCMC, the
ppost value can be empirically estimated as the proportion of the K draws in which the predicted
discrepancy D.Xpostpred.k/ I .k/ jXobs / exceeds the realized discrepancy D.Xobs.k/ I .k/ jXobs /.

DATA, MODEL, AND ESTIMATION FOR ILLUSTRATION

The use of posterior distributions, graphical representations of PPMC, and ppost, pcpost, and
pppost in assessing data–model fit are illustrated via an analysis of data from a sample of 675
students at a large, for-profit, open enrollment university responding to a Web-based version of
674 LEVY

TABLE 1
Sample Covariance Matrix and Means for the
Illustration Data

Subscale PI FI AD FC IGC

PI 0.71
FI 0.42 0.77
AD 0.30 0.30 0.37
FC 0.35 0.47 0.31 0.65
IGC 0.16 0.13 0.18 0.17 0.22
M 3.33 2.99 3.88 3.67 4.59
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

Note. PI D Peer Interaction; FI D Faculty Interaction; AD D


Academic and Intellectual Development; FC D Faculty Concern;
IGC D Institutional Goal Commitment.

the Institutional Integration Scale (Pascarella & Terenzini, 1980). The scale measures students’
collegiate experiences with and perceptions of peers, faculty, intellectual growth, and academic
goals via 31 5-point Likert-type items organized into five subscales: Peer Interaction, Faculty
Interaction, Academic and Intellectual Development, Faculty Concern, and Institutional Goal
Commitment. Summary statistics for the subscales are given in Table 1.
A one-factor model is employed because evidence from other samples suggests a two-factor
model might provide adequate fit (French & Oakes, 2004) and the use of a more restrictive
model should then yield misfit to be detected, as is the focus of this work. Importantly, the
methods are not limited to the choice of this model (or the choices of prior distributions, form
of the likelihood, etc.) and apply directly to more complex models. In fitting the single-factor
model, the location and scale of the latent variable are identified by fixing ›1 D 0 and œ11 D 1.
Relatively diffuse prior distributions are employed for the remaining parameters (see Appendix).
The model was estimated using three chains from dispersed starting values in WinBUGS
(Spiegelhalter et al., 2007). An analysis of the history of the values (Brooks & Gelman,
1998) indicated that 500 iterations were sufficient to burn in the chains. Following this burn-in
period, 1,000 draws for each chain (totaling 3,000 draws) were used to empirically estimate the
posterior distribution, as summarized in Table 2. These draws were then used to conduct PPMC
and calculate ppost . To calculate pcpost , 300 data sets were drawn from the prior predictive
distribution and treated as the observed data in conducting MCMC and PPMC, yielding p
values used as a reference for ppost based on the observed data. Using methods described in
Bayarri and Berger (2000a) a Metropolis sampler was run for 3,000 iterations to estimate the
partial posterior distribution to calculate pppost. All p values reported here are to be interpreted
as estimates based on these empirical reference distributions.

DATA-MODEL FIT RESULTS

Selected results are presented to illustrate features of the methods. The results are organized
by the aspects of data–model fit targeted by the discrepancy measures. In addition, to facilitate
a comparison between the Bayesian and traditional approaches to data–model fit, the model
was estimated using maximum likelihood (ML) in EQS (Bentler, 2004) and commonly used
BAYES MODEL FIT 675

TABLE 2
Summary of the Posterior Distribution

95% Credibility
Parameter M SD Interval

œPI 1.00 NA NA NA
œF I 1.05 0.06 0.94 1.17
œAD 0.76 0.04 0.69 0.85
œF C 1.01 0.06 0.91 1.13
œIGC 0.43 0.03 0.37 0.50
£PI 3.33 0.03 3.26 3.40
£F I 2.99 0.03 2.93 3.06
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

£AD 3.88 0.02 3.84 3.93


£F C 3.67 0.03 3.60 3.73
£IGC 4.59 0.02 4.56 4.63
¢Ÿ21 0.42 0.04 0.35 0.50
™PI 0.38 0.02 0.34 0.43
™F I 0.36 0.03 0.32 0.41
™AD 0.17 0.01 0.15 0.19
™F C 0.27 0.02 0.24 0.31
™IGC 0.18 0.01 0.16 0.20

Note. PI = Peer Interaction; FI D Faculty Interaction;


AD D Academic and Intellectual Development; FC D Faculty
Concern; IGC D Institutional Goal Commitment.

SEM data–model fit indexes were evaluated using the ML estimates (Hu & Bentler, 1999). The
results of this analysis indicate suspect fit: LR D 164:60, which, based on a null ¦2 distribution
with 5 degrees of freedom, yields p < :0001; SRMR D .06; Comparative Fit Index (CFI) D
.89; root mean square error of approximation (RMSEA) D .22, with a 90% confidence interval
of (.19, .25).

Centrality and Variability


Figure 2 depicts a smoothed density for the posterior predictive distribution of the variance of
the faculty interaction variable. The vertical line corresponds to the variance from the observed
data Xobs . The observed value falls in the middle of the distribution, indicating that the observed
value is quite consistent with the distribution of model-implied values. The estimated ppost
value, .50, is the proportion of the posterior predicted values greater than the realized value.
This result supports the inference that the variance of faculty interaction is well accounted for
by the model. The results for remaining variables’ variances and the variables’ means (not
shown) are similar to that in Figure 2 and support the conclusion that the means and variances
are well accounted for by the model.

Overall Model Fit of the Covariance Structure


Figure 3 contains a scatterplot of the realized and posterior predicted values of LR. That is,
the values along the horizontal axis denote the value of LR based on .Xobs I .1/ jXobs /; : : : ;
.Xobs I .3000/jXobs / and the values along the vertical axis denote the value of LR based
676 LEVY
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

FIGURE 2 Posterior predictive distribution of the variance of faculty interaction with the realized value
plotted as the vertical line.

on .Xpostpred.1/ I .1/jXobs /; : : : ; .Xpostpred.3000/I .3000/jXobs /. The distribution of the posterior


predictive value along the vertical axis represents the sampling variability of LR over samples
that are consistent with the model, marginalizing over the posterior distribution.
The unit line is added as a reference. If the points were randomly scattered about the line,
that would indicate that the realized and posterior predicted values were of similar magnitudes.
Here, it is clearly seen that the realized values are considerably larger than the posterior
predictive values. The ppost value is the proportion of draws above the line, which in this
case was 0.00; the pcpost value was also 0.00. This indicates that the discrepancy between the

FIGURE 3 Scatterplot of realized and posterior predicted values of likelihood ratio.


BAYES MODEL FIT 677

observed and model-implied covariance structure is larger than what is expected based on the
model.
A similar conclusion would be reached by using LR in a traditional approach. Using the ML
estimates as point estimates, LR D 164:60, which, based on an assumed null ¦2 distribution
with 5 degrees of freedom, yields p < :0001. It is noteworthy that that this value of LR
falls just below the smallest of the realized values of LR from the posterior distribution,
which was 166.40 (depicted as the leftmost point in Figure 3). This occurs because whereas
ML estimates are obtained by a process that minimizes LR, the Bayesian analysis yields a
distribution of LR (i.e., the distribution of realized values), generated by evaluating LR over
the posterior distribution of the values for the parameters. Similarly, whereas the traditional
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

approach uses a reference distribution that captures the variability in LR owing to sampling
variability of the data, the reference distribution in the Bayesian analysis (i.e., the posterior
predictive distribution) captures the variability of LR due to the sampling variability of the
data and the variability of the parameters, given by the posterior distribution. This underscores
the point that, in contrast to the traditional approach, the Bayesian approach explicitly builds
the uncertainty associated with the estimation of the model parameters into the data–model fit
procedures.
Turning to SRMR, the results from the traditional and Bayesian analyses are somewhat
contradictory. Unlike LR, SRMR is typically employed in SEM by an appeal to cutoff values in
traditional approaches (e.g., .08; Hu & Bentler, 1999), rather than a reference distribution. In
this example, using the ML estimates in a traditional framework, SRMR D .06, which would
constitute evidence of adequate fit. The posterior distribution for the realized values of SRMR
from the Bayesian analysis is depicted in the first panel of Figure 4, and has a posterior mean
of .25 and 95% highest posterior density of (.20, .32). The use of the posterior distribution
supports the strategy of evaluating SRMR relative to a cutoff in a probabilistic manner; here

(a) (b)
FIGURE 4 Results for standardized root mean square residual (SRMR). (a) Posterior distribution of the
realized values of SRMR. (b) Scatterplot of realized and posterior predicted values of SRMR.
678 LEVY

P (SRMR  .08) D 0.00. In a PPMC framework, the distribution of realized SRMR values
is evaluated relative to the posterior predictive distribution, which captures the model-implied
values of SRMR from data consistent with the posterior distribution. The second panel of
Figure 4 contains the scatterplot of realized and posterior predicted values of SRMR where
the distribution of the posterior predicted values represents the sampling variability of SRMR
over data that are consistent with the model, marginalizing over the posterior distribution. It
is observed that the realized values are consistently larger than the posterior predicted values;
the ppost value is .03 and the pcpost value is 0.00.
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

Associations between Variables


Densities for the posterior predictive distributions of the covariances between the subscales
are plotted above the diagonal in Figure 5. In each plot, the vertical line corresponds to the
covariance from Xobs . Below the diagonal are scatterplots of the realized and posterior predicted
values of MB-Cor, where the unit line is added as a reference. Note that in each plot of MB-Cor
the posterior predicted values appear randomly scattered around 0. This is to be expected, as the
Xpostpred are generated through the model; hence the model should perform well in explaining
those associations. The variability of the posterior predictive values of MB-Cor represents the
sampling variability in data that are consistent with the model, marginalizing over the posterior
distribution. Table 3 contains p values for these measures for the pairs of variables. Entries
above the diagonal are the ppost , pcpost , and pppost values for the covariance; entries below the
diagonal are the ppost and pcpost values for MB-Cor.
Consideration of Figure 5 reveals that the results for the covariance and the MB-Cor are
similar in indicating the model suffers in that it underpredicts the association between academic
and intellectual development and institutional goal commitment; for the covariance, ppost < :01,
pcpost D :00, pppost D :00 and for MB-Cor ppost D :00 and pcpost D :00. Likewise, the
model underpredicts the association between faculty interaction and faculty concern; for the
covariance, ppost D :05, pcpost D :00, pppost D :05 and for MB-Cor ppost D :00, pcpost D :00.
In contrast, the model overpredicts the association between faculty interaction and institu-
tional goal commitment; for the covariance, ppost D :99, pcpost D 1:00, pppost D :99 and for
MB-Cor ppost D 1:00, pcpost D 1:00. Similarly, the model overpredicts the association between

TABLE 3
Estimated p Values for the Covariance and MB-Cor

ppost pcpost pppost ppost pcpost pppost ppost pcpost pppost ppost pcpost pppost ppost pcpost pppost

PI 0.41 0.27 0.41 0.50 0.48 0.49 0.93 1.00 0.93 0.78 0.97 0.77
0.04 0.01 FI 0.76 0.99 0.77 0.05 0.00 0.05 0.99 1.00 0.99
0.10 0.06 0.96 0.99 AD 0.39 0.08 0.39 <0.01 0.00 0.00
0.98 0.99 0.00 0.00 0.30 0.19 FC 0.54 0.81 0.54
0.77 0.84 1.00 1.00 0.00 0.00 0.58 0.73 IGC

Note. ppost , pcpost , and pppost values for the covariance are above the main diagonal. ppost and pcpost values for
MB-Cor are below the main diagonal. Entries of < .01 indicate that the p value rounded to but was above .00. PI D
Peer Interaction; FI D Faculty Interaction; AD D Academic and Intellectual Development; FC D Faculty Concern;
IGC D Institutional Goal Commitment.
BAYES MODEL FIT 679
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

FIGURE 5 Posterior predictive distributions and realized values for the covariance and MB-Cor. Note. PI D
Peer Interaction; FI D Faculty Interaction; AD D Academic and Intellectual Development; FC D Faculty
Concern; IGC D Institutional Goal Commitment.

peer interaction and faculty concern; for the covariance, ppost D :93, pcpost D 1:00, pppost D :93
and for MB-Cor ppost D :99, pcpost D :99.
The magnitudes of the realized values of MB-Cor serve as an effect size for the magnitude
of the misfit in modeling the association between two variables in the correlation metric.
Consideration of the realized values of MB-Cor (graphically plotted as the horizontal axes
in Figure 5) indicates that the 95% highest posterior density interval for MB-Cor between
academic and intellectual development and institutional goal commitment is (.31, .40), the
95% highest posterior density interval for MB-Cor between faculty interaction and faculty
concern is (.12, .24), and the 95% highest posterior density interval for MB-Cor between
faculty interaction and institutional goal commitment is ( .26, .17).
680 LEVY

Synthesizing the results from the example, it can be concluded that the model accounts for
the variables’ individual means and variances well (as evidenced by those test statistics), but
suffers in terms of characterizing the associations between the variables overall (as evidenced
by LR and SRMR). Pursuing this further, the model’s main weaknesses appear to be that
it considerably underpredicts the association between academic and intellectual development
and institutional goal commitment and between faculty interaction and faculty concern while
overpredicting the association between faculty interaction and institutional goal commitment.
These results suggest that a two-factor model in which the subscales are organized around a
factor for faculty measures and a factor for student measures might provide better fit (French
& Oakes, 2004), although other possibilities certainly exist.
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

DISCUSSION

Bayesian modeling is characterized by the use of distributions for all unknown quantities.
Whereas frequentist approaches employ point estimates of parameters in the calculation of
discrepancy measures and in deriving sampling distributions, Bayesian approaches employ
distributions of parameters in the calculation of discrepancy measures and marginalize over
distributions of parameters in deriving reference distributions. Note that for some discrepancy
measures (e.g., SRMR and MB-Cor), the magnitude of the realized values can be interpreted
directly, without appeal to a reference distribution.
With the exception of prior predictive modeling checking, the procedures employ the
posterior distribution of the parameters after conditioning on the data (and possibly other
terms) in this capacity. Thus, prior predictive checks are entirely dependent on the choice
of the prior distribution, as the prior is essentially what is being evaluated. The remaining
procedures that use a posterior distribution are subject to the influence of the prior to the extent
that the prior influences the posterior, which decreases as sample size increases. It is noteworthy
that cross-validation procedures employ a subset of the observed data to obtain the posterior
and therefore might be more influenced by the prior than the other procedures. Options for the
analyst concerned about the influence of the prior include running the analysis using different
priors to assess the sensitivity of the results to the choice of the prior, or the use of a highly
diffuse prior so that the posterior is more fully determined by the data, as was done for the
illustration (see Appendix).
Of the Bayesian approaches to data–model fit, PPMC has received the most attention in
terms of application, methodological research, and criticism. I briefly review its advantageous
features, connecting them to the results of the illustration, and then turn to the criticism of
PPMC that motivates many of the other procedures. PPMC holds advantages over traditional
approaches in that it incorporates uncertainty of estimation into model checking by using the
posterior distribution rather than a point estimate, does not depend on asymptotic arguments, and
empirically constructs the reference distribution, supporting the use of complex test statistics
and discrepancy measures (for discussions, see, e.g., Gelman, 2003, 2004; Gelman et al., 1996;
Levy, et al., 2009; Meng, 1994). MB-Cor is an example of one such function that might prove
difficult for deriving sampling distributions, yet is easily investigated using PPMC.
With the flexibility of the model checking procedures to handle a wide class of functions
comes the responsibility for selecting or constructing appropriate discrepancy measures (and
BAYES MODEL FIT 681

test statistics). Careful consideration of how the chosen functions characterize the (lack of) fit
between the data and the model is needed. In the illustration, the mean and variance did not
indicate any lack of fit, whereas the remaining functions that involve examining the associations
between the variables did. The model has an effectively saturated mean and variance structure
because each variable has unique intercept and error variance parameters. Hence, solid fit of
the model in terms of the means and variances is not surprising. However, the model places
restrictions on the associations between the variables. As such the remaining measures had
the potential to indicate a lack of fit. The implication is that due consideration must be given
to the choice of the discrepancy measures and test statistics such that the measures reflect
(a) substantive aspects of the theory of interest, and (b) features of the data that might not
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

necessarily be adequately modeled. A fertile area of research involves identifying effective


discrepancy measures for investigating different model violations.
From a hypothesis testing perspective, PPMC suffers in that ppost values can yield conser-
vative tests (Robins et al., 2000). However, appropriately chosen discrepancy measures should
yield ppost values that perform quite well under null conditions (Bayarri & Berger, 2000b;
Gelman, 2007; Johnson, 2007). Not surprisingly, research on ppost values in latent variable
modeling has been mixed, in some cases finding Type I error rates to be less than nominal
values and in other cases finding Type I error rates at or slightly below nominal values (Fu,
Bolt, & Li, 2005; Levy et al., 2009; Sinharay & Johnson, 2003). It is perhaps because of the
possible conservativeness that widely differing recommendations on what counts as an extreme
ppost value have been offered (e.g., Sinharay et al., 2006; Zhu & Lee, 2001).
Calibrated and partial posterior predictive checking can be viewed as attempts to address
the possible conservativeness of ppost , thereby yielding p values (pcpost and pppost, respectively)
that do support classical interpretations. However, they suffer in that (a) in the case of pppost,
they are limited to test statistics; (b) they pose computational burdens, pcpost via the need for
double simulation and pppost via the need to construct a Metropolis sampler; and (c) they do
not support graphical representations for diagnostic purposes.
In connection with this last point, the use of graphical displays is a strength of PPMC
relative to other methods. The plots in Figures 2 through 5 efficiently allow the analyst to
synthesize the (in)adequacy of the model in terms of accounting for the features captured
by those discrepancy measures and test statistics. A diagnostic perspective on PPMC focuses
on such graphical displays as powerful communicative devices for characterizing the model’s
strengths and weaknesses (Gelman, 2004), and views ppost values as summaries of the analyses.
Graphical representations might also facilitate the investigation of patterns from multiple
discrepancy measures, which could reveal much more information than can be gleaned from
considering one result at a time. Figure 5 illustrates one approach, where assembling the
results graphically allows the researcher to visually assess the patterns of the results from
MB-Cor. The analysis of the set of results provides more information regarding features
of the data and the potential model flaws than would be obtained if we were to consider
each result separately. Efficient graphical representations when conducting PPMC can greatly
facilitate the investigation of patterns when many discrepancy measures are employed and
need to be synthesized (see, e.g., Gelman, 2004; Sinharay et al., 2006). Alternatively, the
analyst might organize the results by ranking them, say, by the p value to identify areas of
primary concern. Similarly, the flexibility of the Bayesian approach allows for the construction
of discrepancy measures (and test statistics) at multiple levels of analysis to be analyzed
682 LEVY

sequentially that can reduce the burden (Levy & Svetina, 2011). For example, in the current
context, if the results of the LR and SRMR indicate the overall covariance structure is well
accounted for, analyses of the associations among all the pairs of observables might not be
needed.
The relative performance of these approaches has not been thoroughly explored. Method-
ological research is needed to provide specific recommendations for practitioners. From the
hypothesis testing perspective, methodological research on which discrepancy measures yield
ppost values with adequate Type I error rates in SEM is needed to support its use. The identi-
fication of discrepancy measures that yield well-behaving ppost values (from this perspective)
might obviate the need for the more computationally intensive pcpost and pppost. Similarly,
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

research is needed on the use of these data–model fit procedures for the related situation of
model comparison in which multiple models are under consideration. See Rubin and Stern
(1994) for early efforts in this area using PPMC and see Gelfand (1996) for the use of the
cross-validation predictive techniques to approximate the Bayes factor.
In addition to the findings of methodological research, practitioners’ choices among the
alternatives will no doubt be influenced by the availability of resources. One barrier to usage of
Bayesian data–model fit procedures in SEM is the lack of dedicated software. Recently, AMOS
(Arbuckle, 2006) and Mplus (Muthén & Muthén, 1998–2010) implemented Bayesian estimation
and PPMC procedures for LR. The general purpose software WinBUGS (Spiegelhalter et al.,
2007) can be used to enact MCMC estimation for a wide variety of models, and code
can be included to obtain realized and posterior predictive distributions and compute ppost .
Although not discussed in detail in this work, WinBUGS can similarly be used to calculate
prior predictive and cross-validated p values. To date, no software package has implemented
the routine calculation of pcpost or pppost. If a method for calculating ppost is available, the
calculation of pcpost additionally requires an external process that (a) generates draws from
the prior predictive distribution, (b) governs the calculation of ppost based on each such draw,
and (c) aggregates the distribution of these values and compares this distribution to the value
of ppost from the observed data. This is a conceptually simple but computationally intensive
process. The calculation of pppost poses a slightly different challenge in that it requires the
construction of a Metropolis sampler after the posterior distribution for  has been obtained.
In this work, draws from the posterior predictive and partial posterior predictive distributions
and the calculations of ppost, pcpost, and pppost were conducted using R code, available at https://
sites.google.com/a/asu.edu/roylevy/papers-software.
In light of the current state of the field, I advance the following summary and recommen-
dations. Bayesian data–model fit assessment is characterized by the use of distributions for .
Most applications will employ posterior distributions in some guise for . For discrepancy
measures with natural interpretations, the (posterior) distribution of the realized values might
be informative. If the analyst adopts a diagnostic perspective on data–model fit assessment,
PPMC offers the appeal of graphical summaries as well as ppost summaries, reasonably practical
implementation, and applicability to any desired discrepancy measure. If the analyst adopts a
hypothesis testing perspective and desires classical interpretations of results, PPMC and ppost
can be employed with certain discrepancy measures. Alternatively, pcpost or pppost could be
used (the latter for test statistics only), provided the analyst programs the requisite calcula-
tions. Methodological research and the development of software for conducting these analyses
routinely will greatly enhance their viability for practitioners.
BAYES MODEL FIT 683

ACKNOWLEDGMENTS

I wish to thank the Teacher’s Corner editor and three reviewers for their insightful comments
on earlier versions of this article.

REFERENCES

Ansari, A., Jedidi, K., & Dube, L. (2002). Heterogeneous factor analysis models: A Bayesian approach. Psychometrika,
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

67, 49–78.
Arbuckle, J. L. (2006). Amos 7.0 user’s guide. Chicago, IL: SPSS.
Arminger, G., & Muthén, B. O. (1998). A Bayesian approach to nonlinear latent variable models using the Gibbs
sampler and the Metropolis–Hastings algorithm. Psychometrika, 63, 271–300.
Bayarri, M. J., & Berger, J. O. (2000a). P values for composite null models. Journal of the American Statistical
Association, 95, 1127–1142.
Bayarri, M. J., & Berger, J. O. (2000b). Rejoinder. Journal of the American Statistical Association, 95, 1168–1170.
Bentler, P. M. (2004). EQS 6 structural equations program manual. Encino, CA: Multivariate Software.
Berkhof, J., van Mechelen, I., & Gelman, A. (2004). Enhancing the performance of a posterior predictive check (Tech.
Rep. No. 0350). Louvain-la-Neuve, Belgium: IAP Statistics Network.
Box, G. E. P. (1980). Sampling and Bayes inference in scientific modeling and robustness. Journal of the Royal
Statistical Society Series A, 143, 383–430.
Box, G. E. P., & Draper, N. R. (1987). Empirical model-building and response surfaces. New York, NY: Wiley.
Brooks, S. P., & Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of
Computational and Graphical Statistics, 7, 434–455.
Cudeck, R., & Browne, M. W. (1983). Cross-validation of covariance structures. Multivariate Behavioral Research,
18, 147–167.
Dahl, F. A. (2006). On the conservativeness of posterior predictive p-values. Statistics & Probability Letters, 76,
1170–1174.
Evans, M. (1997). Bayesian inference procedures derived via the concept of relative surprise. Communications in
Statistics Part A: Theory and Methods, 26, 1125–1143.
Evans, M. (2000). Comment. Journal of the American Statistical Association, 95, 1160–1163.
French, B. F., & Oakes, W. C. (2004). Reliability and validity evidence for the Institutional Integration Scale.
Educational and Psychological Measurement, 64, 88–98.
Fu, J., Bolt, D. M., & Li, Y. (2005, April). Evaluating item fit for a polytomous fusion model using posterior predictive
checks. Paper presented at the annual meeting of the National Council on Measurement in Education, Montréal,
Canada.
Gelfand, A. E. (1996). Model determination using sampling-based methods. In W. R. Gilks, S. Richardson, & D. J.
Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 145–161). London, England: Chapman & Hall.
Gelman, A. (2003). A Bayesian formulation of exploratory data analysis and goodness-of-fit testing. International
Statistical Review, 71, 369–382.
Gelman, A. (2004). Exploratory data analysis for complex models. Journal of Computational and Graphical Statistics,
13, 755–779.
Gelman, A. (2006) Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1, 515–
533.
Gelman, A. (2007). Comment: Bayesian checking of the second levels of hierarchical models. Statistical Science, 22,
349–352.
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (1995). Bayesian data analysis. London, England: Chapman
& Hall.
Gelman, A., Meng, X. L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepan-
cies. Statistica Sinica, 6, 733–807.
Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (Eds.). (1996). Markov chain Monte Carlo in practice. London,
England: Chapman & Hall.
684 LEVY

Gill, J. (2007). Bayesian methods: A social and behavioral sciences approach (2nd ed.). Boca Raton, FL: Chapman
& Hall/CRC.
Guttman, I. (1967). The use of the concept of a future observation in goodness-of-fit problems. Journal of the Royal
Statistical Society Series B, 29, 83–100.
Hjort, N., Dahl, F. A., & Steinbakk, G. H. (2006). Post-processing posterior predictive p values. Journal of the American
Statistical Association, 101, 1157–1174.
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria
versus new alternatives. Structural Equation Modeling, 6, 1–55.
Johnson, V. E. (2007). Bayesian model assessment using pivotal quantities. Bayesian Analysis, 2, 719–734.
Lee, S. Y. (2007). Structural equation modeling: A Bayesian approach. West Sussex, UK: Wiley.
Lee, S. Y., & Song, X. Y. (2004). Evaluation of Bayesian and maximum likelihood approaches in analyzing structural
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

equation models with small samples. Multivariate Behavioral Research, 39, 653–686.
Lee, S. Y., Song, X. Y., & Tang, N. S. (2007). Bayesian methods for analyzing structural equation models with
covariates, interaction, and quadratic latent variables. Structural Equation Modeling, 14, 404–434.
Lee, S. Y., & Tang, N. S. (2006). Bayesian analysis of nonlinear structural equation models with nonignorable missing
data. Psychometrika, 71, 541–564.
Lee, S. Y., & Zhu, H. T. (2000). Statistical analysis of nonlinear structural equation models with continuous and
polytomous data. British Journal of Mathematical and Statistical Psychology, 53, 209–232.
Levy, R., Mislevy, R. J., & Sinharay, S. (2009). Posterior predictive model checking for multidimensionality in item
response theory. Applied Psychological Measurement, 33, 519–537.
Levy, R., & Svetina, D. (2011). A generalized dimensionality discrepancy measure for dimensionality assessment in
multidimensional item response theory. British Journal of Mathematical and Statistical Psychology, 64, 208–232.
Lynch, S. M. (2007). Introduction to applied Bayesian statistics and estimation for social scientists. New York, NY:
Springer.
Meng, X. L. (1994). Posterior predictive p-values. The Annals of Statistics, 22, 1142–1160.
Muthén, L. K., & Muthén, B. O. (1998–2010). Mplus user’s guide (6th ed.). Los Angeles, CA: Muthén & Muthén.
Pascarella, E. T., & Terenzini, P. T. (1980). Predicting freshman persistence and voluntary dropout decisions from a
theoretical model. Journal of Higher Education, 51, 60–75.
Press, S. J. (1989). Bayesian statistics: Principles, models, and applications. New York, NY: Wiley.
Robins, J. M., van der Vaart, A., & Ventura, V. (2000). The asymptotic distribution of P values in composite null
models. Journal of the American Statistical Association, 95, 1143–1172.
Rowe, D. B. (2003). Multivariate Bayesian statistics: Models for source separation and signal unmixing. Boca Raton,
FL: CRC Press.
Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Annals of
Statistics, 12, 1151–1172.
Rubin, D. B. (1996). Comment: On posterior predictive p-values. Statistica Sinica, 6, 787–792.
Rubin, D. B., & Stern, H. S. (1994). Testing in latent class models using a posterior predictive check distribution. In A.
von Eye & C. C. Clogg (Eds.), Latent variables analysis: Applications for developmental research (pp. 420–438).
Thousand Oaks, CA: Sage.
Rupp, A. A., Dey, D. K., & Zumbo, B. D. (2004). To Bayes or not to Bayes, from whether to when: Applications of
Bayesian methodology to modeling. Structural Equation Modeling, 11, 424–451.
Scheines, R., Hoijtink, H., & Boomsma, A. (1999). Bayesian estimation and testing of structural equation models.
Psychometrika, 64, 37–52.
Sinharay, S., & Johnson, M. (2003). Simulation studies applying posterior predictive model checking for assessing fit
of the common item response theory models (ETS Research Rep. No. RR-03-28). Princeton, NJ: Educational Testing
Service.
Sinharay, S., Johnson, M., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models.
Applied Psychological Measurement, 30, 298–321.
Sinharay, S., & Stern, H. S. (2003). Posterior predictive model checking in hierarchical models. Journal of Statistical
Planning and Inference, 111, 209–221.
Song, X. Y., & Lee, S. Y. (2002). Analysis of structural equation model with ignorable missing continuous and
polytomous data. Psychometrika, 67, 261–288.
Song, X. Y., Lee, S. Y., & Hser, Y. I. (2009). Bayesian analysis of multivariate latent curve models with nonlinear
longitudinal latent effects. Structural Equation Modeling, 16, 245–266.
BAYES MODEL FIT 685

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde A. (2002). Bayesian measures of model complexity
and fit. Journal of the Royal Statistical Society B, 64, 583–639.
Spiegelhalter, D. J., Thomas, A., Best, N. G., & Lunn, D. (2007). WinBUGS user manual: Version 1.4.3. Cambridge,
UK: MRC Biostatistics Unit.
Stern, H. S. (2000). Comment. Journal of the American Statistical Association, 95, 1157–1159.
Zhu, H. T., & Lee, S. Y. (2001). A Bayesian analysis of finite mixtures in the LISREL model. Psychometrika, 66,
133–152.

APPENDIX
Downloaded by [University of California Santa Cruz] at 17:19 03 April 2015

The prior distribution is constructed by specifying distributions for the subjects’ latent variables
and for the model parameters. For the model parameters, we assume a priori independence and
employ generalized conjugate priors (Press, 1989; Rowe, 2003; see Lee, 2007, for approaches
using recursive conditional distributions). We specify normal distributions for the intercepts,
factor means, and loadings and inverse-gamma distributions for the variances (Arminger &
Muthén, 1998; Lee, 2007; Rowe, 2003). The priors used in the illustration were:

£j  N.3; 10/j D 1; : : : ; J I

œj1  N.1; 10/j D 2; : : : ; J I

™jj  Gamma 1 .5; 10/j D 1; : : : ; J:

Assumptions of subject independence and exchangeability imply that the joint distribution
of the subjects’ latent variables P .„j›; ˆ/ can be factored into the product of common prior
distributions for each Ÿi . Assuming normality of the latent variables, the prior for the latent
variables for each subject is defined as

Ÿi  N.›; ˆ/:

In the illustration, › was fixed to identify the model. With a single factor, ˆ D ¢Ÿ21 , which
can be modeled similarly to the error variances:

¢Ÿ21  Gamma 1 .5; 10/:

You might also like