Professional Documents
Culture Documents
those listed supply essentially the same basic information, with minor variations in the details supplied.
Thus, the eight parameter LISREL model, which
arose out of the work of Keesling and Wiley (see
Wiley, 1973) and was subsequently developed to
its current state by Joreskog (see Joreskog and Sorbom, 1996), the four-matrix model of Lohmoller
(1981), the three-matrix EQS model of Bentler and
Weeks (1980), and the two-matrix RAM model (see
McArdle & McDonald, 1984) rearrange the same set
of parameters. Not surprisinglyand perhaps not regrettablyuser guides and texts on this topic are not
in agreement in their recommendations about the
style of presentation of results (e.g., see Bollen, 1989;
Loehlin, 1992; Long, 1983a, 1983b). There is even
less agreement in the form of the results actually reported in articles on applications.
It would be immodest for any journal article to
offer a code of practice for the presentation of SEM
results. It could also be counterproductive. (We note
that for a long time there was a uniformly accepted
convention for the publication of analysis of variance,
or ANOVA results: the standard ANOVA table and
the table of cell means. The near-disappearance of this
from journals is regrettable.) Sound guidelines for the
reporting of SEM results have been offered previously
by Steiger (1988), Breckler (1990), Raykov, Tomer,
and Nesselroade (1991), Hoyle and Panter (1995), and
Boomsma (2000). MacCallum and Austin (2000) provided an excellent general survey of problems in applications of SEM.
64
Our objective is to review some principles for reporting SEM results. We also summarize some observations on the variety of treatments of results given in
a selected set of reports on applications. The comparison of principles with practice is intended to underscore the importance of the recommendations made
here. These are aimed at increasing the information
supplied to the reader. Increased information should
allow a more critical assessment of the study reported
and serve to advance scientific knowledge. Naturally,
some of our recommendations are essentially endorsements of those already given. Furthermore, this topic
may need occasional revisiting until practice conforms with principles. We concentrate (appropriately)
on matters in which there should be little disagreement. However, we do address a number of fresh
elements here. We try to give fairly specific guidance
on implementing previously published recommendations as well as our own. We also offer a few mild
departures from previous suggestions. It is assumed
that the reader is familiar with basic SEM method and
with common terminology. However, we do offer occasional reminders about method, as well as remarks
about recent developments that the reader may have
missed.
From well-known SEM principles, we can formulate a list of results that we might hope to find in a
comprehensive report, and we can check current practice against this list. Of course, there can be a conflict
between an ideal of completeness and the understandable desire for conciseness in journal publications. This conflict is severe in the case of very large
models.
We surveyed articles using SEM from 1995 to 1997
in the following journals: British Journal of Psychology, Child Development, Developmental Psychology,
Journal of Abnormal Psychology, Journal of Applied
Psychology, Journal of Applied Social Psychology,
Journal of Consulting and Clinical Psychology, Journal of Counseling Psychology, Journal of Educational
Psychology, Journal of Family Psychology, Journal
of Personality and Social Psychology, Journal of Research in Personality, and Psychological Assessment.
The method of search was simply to look for all path
diagrams in the journals and period named. The intention was to find a reasonably representative
sample, limited to path models with latent variables
and to single-population studies.1 This survey yielded
100 possible articles for review, of which 41 met the
criteria: (a) path diagrams must be provided; (b) models being fitted must include both measurement and
65
Model Specification
Generally, a structural equation model is a complex composite statistical hypothesis. It consists of
two main parts: The measurement model represents a
set of p observable variables as multiple indicators
of a smaller set of m latent variables, which are usually common factors. The path model describes relations of dependencyusually accepted to be in some
sense causalbetween the latent variables. We reserve the term structural model here for the composite
SEM, the combined measurement and path models.
This avoids the ambiguity that arises when the path
model component is also labeled the structural
model.
In most applications the measurement model is a
conventional confirmatory factor model; the latent
variables are just common factors and the error or
specific terms are uncorrelated. The most common
exception concerns longitudinal models with measurements replicated at two or more time points. Longitudinal models usually need specific components
that are correlated across replicated measurements.
Very commonly, the measurement model is an independent clusters model, that is, a factor model in
which no indicator loads on more than one common
factor.
In turn, the path model structures the correlation or
66
MCDONALD AND HO
covariance matrix of the common factors. This structuring usually corresponds to a set of conjectured
causal relations. The path model itself is also a composite hypothesis. It requires the specification both of
a set of present versus absent directed arcs (paths)
between latent variables, and a set of present versus
absent nondirected arcs. Terminology in SEM is currently in a state of flux, because of the introduction of
ideas from directed acyclic graph theory (for a general
account, see Pearl, 1998, 2000). Here we follow Pearl
in referring to a single direct connection between two
variables in a path diagram as an arc, not a path. We
reserve the term path for a sequence of arcs connecting two variables.
Pearl (2000) made a strong case for the position
that a path model represents the operations of causality. The directed arcs indicate direct effects of conjectural (counterfactual) actions, interventions, or
states of the world, whereas nondirected arcs represent correlated disturbances, random terms corresponding to variations not explained by the model. A
directed arc in the graph of a path model is a singleheaded arrow from one variable to another. A nondirected arc is usually drawn as a double-headed arrow.
(The reader might wish to refer to Figures 13, where
directed arcs drawn from the latent variable F1 to F2
and to F3 and from F1 to the indicator variables Y1 and
Y2 are examples of directed arcs, whereas the doubleheaded arrow between F2 and F3 in Figure 1 or between d2 and d3 is an example of a nondirected arc in
Figure 2.) On the causal interpretation, the absence of
a directed arc from one variable to another implies the
absence of a direct effect, whereas the absence of a
nondirected arc implies that there are no omitted variables explaining their relationship. (These issues will
be explored in greater detail later.)
In contrast to path equations, regression equations
are essentially predictive and correspond to conditioning on observations of explanatory variables without
manipulationactual or theoretical. This is a rather
technical distinction. For our purposes we note that
residuals in a linear regression equation are uncorrelated with the independent variables by definition.
The disturbances (unexplained variations) in a path
equation can be correlated with the causal variables in
that equation. Pearl (2000) gave graph-theory conditions under which a path equation is a regression
equation; McDonald (in press) gives algebraic conditions. Most applications use recursive models, with no
closed cycles formed by directed paths. With commonly made assumptions, the path equations of re-
Identifiability
Desirably, the SEM report will contain an explicit
account of the conditions on the model that will secure identifiability, which is logically prior to estimation. Pearl (1998, 2000) gave graphical conditions for
67
68
MCDONALD AND HO
ployed. Instead, a rather technical inquiry is suggested. With all disturbances allowed to be nonzero,
the rank and order rules are applied to see if the exogenous variables can yield identifiability by serving
as instruments. These rules are too technical to be
described here; for an introduction to this method, see
Bollen (1989). (We note that if we follow these rules,
the covariances between disturbances cannot be accounted for by omitted variables.) The contrary view
is that we may assume orthogonality in nonrecursive
models. This belief is at least implicit in the work of
a number of authors. For example, Spirtes, Richardson, Meek, Scheines, and Glymour (1998) used it in a
graph-theory account, and MacCallum, Wegener,
Uchino, and Fabrigar (1993) used it in an account of
equivalent models.2 There appear to be unresolved
problems in the foundations of nonrecursive models.
These have been interpreted in a number of ways,
including feedback, systems measured in equilibrium, or averaged time series (see McDonald, 1997,
for a nontechnical account of some of these issues,
and Fisher, 1970, for a technical account of the averaged time series; see, also, Gollob & Reichardt, 1987,
for the problem of time lags in such models). In the
current state of knowledge we cannot adjudicate between these views, but it is fair to state that it is much
easier to draw a nonrecursive loop in a path model
than to motivate it rigorously.
Of the 41 studies surveyed, 10 have fully ordered
models (except exogenous variables). Of these, all
except 1 followed the orthogonality rule (and, equivalently, the precedence rule). The exception gave no
substantive reason for including a nondirected arc between two variables connected by a directed arc (and
no discussion of a likely identifiability problem). Of
the remaining 31 studies that have one or more unordered pairs of latent variables, 7 chose nondirected
arcs for the unordered pairs, as if they were following
the precedence rule, and 24 had no correlated distur-
69
Of the 41 reports in the survey, 40 gave standardized solutions, the exception being also the only nonrecursive model. All these are presumably computed
by the rescaling method, as they all used either
LISREL or EQS for the analyses.
It is certainly an old psychometric custom to interpret the numerical values of standardized factor loadings, standardized regression coefficients, and standardized path coefficients. It seems as though these
are often thought to be metrically comparable, although their unstandardized counterparts are clearly
not. This is an unsuitable place for a lengthy discussion of the dilemmas underlying the use of standardized coefficients, unstandardized coefficients, or variance explained as measures of the importance of an
explanatory variable in a regression or in a path
model.
We accept that standardization either before or after
estimation is virtually unavoidable for applications of
a path model with latent variables. Standardization
avoids underidentifiability due to arbitrariness of
scale. Experience seems to show that a completely
standardized solution also aids interpretation of the
results. As noted, some computer software is available
that uses methods for obtaining standardized solutions
with correct standard errors. We hope that the use of
these methods will increase in the near future.
Multivariate Normality
Both ML and GLS estimation in SEM require the
assumption of multivariate normality. However, as
Micceri (1989) suggested, much social and behavioral
science data may fail to satisfy this assumption. Several studies of the robustness of the multivariate nor-
70
MCDONALD AND HO
tion of dichotomous, ordered polytomous, and measured variables. Like Brownes (1984) ADF estimator, it requires a very large sample size to obtain
reliable weight matrices. Simulation studies (Muthen,
1989; Muthen & Kaplan, 1992) suggest that the CVM
estimator outperforms the Satorra and Bentler (1994)
and ADF estimators when the number of categories of
the variables are few (< 5).
A mild dilemma stems from the fact that ML estimation and its associated statistics seem fairly robust
against violations of normality, whereas the use of
ADF estimators requires extremely large samples for
reliable weight matrices, far larger than are commonly
available in current SEM applications.
Accordingly, we hesitate to make firm recommendations for the resolution of this dilemma, beyond
noting that Mardias (1970) test of multivariate skewness and kurtosis is well known and implemented in
available computer software. It should, therefore, be
easy for the investigator to see if a problem appears to
exist and to report it to the reader. But in many cases
the sample size will require the investigator to rely on
the robustness of ML/GLS methods.
71
Reporting Data
Subject to editors concerns about space, the
sample covariance or correlation matrix gives the
reader a great deal of freedom to formulate and evaluate plausible alternative models. Mere inspection of
this information can often be useful to the reader.
Either the sample correlation matrix with or without standard deviations or the sample covariance matrix was supplied in 19 of the 41 reports (M 14.74
variables, SD 6.22, range 928). The 22 that did
not supply covariances or correlations have a mean of
21.74 (SD 8.55, range 1040), with only 4 in
excess of 28; none of the latter indicated availability
of their data.
It is desirable that this information be readily available. Few authors justify their model in detail, and it
is generally easy to postulate equally plausible, and
possibly better-fitting alternatives on the basis of such
theory as is available. (Any instructor in quantitative
methods who has given this task as an exercise to
graduate students will confirm this statement.) We
suggest that there is a strong case for publishing the
correlations, possibly accompanied by means and
standard deviations, for up to 30 variables. In the case
of a large set of variables, the author can alternatively
indicate a means to access the covariance or correlation matrix if the reader wishes to do so; possibilities
include application to the author or placing the information in the authors, or possibly the journals, website. In the case of truly large studies, a more comprehensive account than is possible in a journal might
be made available in this way. Information technology is changing too rapidly for this advice to be more
precise.
Goodness of Fit
Except for OLS estimates as usually treated, estimation methods also yield an asymptotic chi-square
and asymptotic standard errors for the parameters in
an identified model. It has long been recognized that
all SEMs are simplified approximations to reality, not
hypotheses that might possibly be true. Accordingly,
72
MCDONALD AND HO
an abundance of indices has been developed as measures of the goodness or badness of the approximation
to the distribution from which the sample was drawn,
and it is very easy to invent many more (see, e.g.,
Bollen & Long, 1993; McDonald & Marsh, 1990).
Available computer programs supply varying subsets
of these. We may distinguish absolute and relative fit
indices: Absolute indices are functions of the discrepancies (and sample size and degrees of freedom);
relative indices compare a function of the discrepancies from the fitted model to a function of the discrepancies from a null model. The latter is almost
always the hypothesis that all variables are uncorrelated. Some psychometric theorists have prescribed
criterion levels of fit indices for a decision to regard
the approximation of the model to reality as in some
sense close. This is considered to make the decision
objective (see, e.g., Hu & Bentler, 1999). There are
four known problems with fit indices. First, there is no
established empirical or mathematical basis for their
use. Second, no compelling foundation has been offered for choosing a relative fit index over an absolute
index, or for regarding uncorrelated variables as a null
model. Third, there is not a sufficiently strong correspondence between alternative fit indices for a decision based on one to be consistent with a decision
based on another; the availability of so many could
license a choice of the best-looking in an application,
although we may hope this does not happen. Fourth,
and perhaps most important, a given degree of global
misfit can originate from a correctable misspecification giving a few large discrepancies, or it can be due
to a general scatter of discrepancies not associated
with any particular misspecification. Clear misspecifications can be masked by the indexed fit of the
composite structural model. It is impossible to determine which aspects of the composite hypothesis can
be considered acceptable from the fit indices alone.
Along with checking these, we recommend examining the (standardized) discrepancies in the measurement model and the individual discrepancies between
the latent variable correlations in the measurement
model and the fitted correlations constrained by the
path model.
.0
6.08
.290
.420
.410
.0
.0
3.00
.550
.630
.0
.0
.0
7.90
.730
.097
.084
.0
.0
7.54
73
can substitute for a detailed examination of the discrepancies. However, if inspection shows that these
are well scattered, they are adequately summarized in
the root mean square residual (RMR), which is an
immediately interpretable measure of the discrepancies. In turn, the GFI is a function of the RMR and the
corresponding root mean square of the sample correlations. The GFI is therefore acceptable to those who
believe that for a given RMR, fit is better if the correlations explained are larger. As shown in some detail by McDonald and Marsh (1990), most other fit
indices can be expressed as functions of the noncentrality parameter connected to ML/GLS estimation.
An unbiased estimate of the noncentrality parameter
is given by d (2 df )/N (McDonald, 1989). This
parameter is also a norm on the sizes of the discrepancies. Accordingly, if the discrepancies are well scattered, such indices capture their general spread well
enough. Those indices that have been shown (as in
McDonald & Marsh, 1990) to be free of sampling
bias, for example, the RMSEA and the URFI (and CFI
if not reset to unity because of overfitting), can be
recommended as supplements to the investigators
primary judgment based on the discrepancy matrix.
74
MCDONALD AND HO
Table 2
Reexamination of the Goodness of Fit of the Structural (s), Measurement (m), and
Path (p) Models for 14 Selected Studies
N
Modela
df
RMSEA
461
357
465
326
507
81
237
289
330
10
197
11
377
12
1556
13
84
14
70
124.8
27.7
97.1
980.7
638.1
342.6
1368.2
1284.1
84.1
212.0
158.0
54.0
400.4
306.8
93.6
112.0
51.3
60.8
519.2
514.8
4.6
521.0
434.9
86.1
288.5
284.4
4.0
685.8
562.3
33.5
161.1
141.2
19.9
725.0
577.0
148.0
50.2
44.7
5.5
41.9
12.4
29.5
106
97
9
391
361
30
472
467
5
89
81
8
161
155
6
56
42
14
340
333
7
180
175
5
214
201
13
300
270
30
80
79
1
269
247
22
43
38
5
21
17
4
.001
1.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.153
.000
.000
.000
.727
.000
.000
.000
.000
.000
.991
.000
.000
.299
.000
.000
.000
.000
.000
.000
.209
.211
.356
.000
.000
.000
.041
.150
.191
1.651
.678
.876
1.929
1.757
.170
.377
.236
.141
1.180
.299
.173
.691
.115
.576
.756
.767
.011
1.180
.899
.280
.226
.252
.026
1.958
1.940
.018
.215
.165
.050
.293
.212
.081
.086
.080
.006
.298
.066
.365
.020
s
m
p
s
m
p
s
m
p
s
m
p
s
m
p
s
m
p
s
m
p
s
m
p
s
m
p
s
m
p
s
m
p
s
m
p
s
m
p
s
m
p
Study
Note.
.146
.065
.046
.171
.064
.061
.184
.065
.054
.133
.054
.044
.170
.111
.052
.203
.047
.048
.081
.072
.237
.032
.035
.081
.085
.025
.052
.046
.224
.033
.029
.061
.045
.046
.035
.119
.302
75
ances in the measurement model, and the independently estimated directed arc coefficients and disturbance variances and covariances in the path model.
Special cases include: (a) pure measurement models,
that is, traditional confirmatory common factor models; (b) path models for observable variables; and (c)
mixed models in which some variables in the path
model are observable and some are latent.
We note in passing that the possibility of mixed
models underlines a dilemma in SEM, needing further
research. As pointed out by McDonald (1996), the
researcher usually has a choice between making a
path model with latent variables and a path model
with composite variables: simple or weighted sums of
indicators. The common choice seems to be the
former, because it estimates causal relations between
attributes that are corrected for errors of measurement. For these causal relations to be taken seriously,
we must be able to suppose we could, in principle, add
enough further indicators to make error-free measurements. If the number of indicators of each attribute is
small (as seems common in applications), such corrections may themselves be unreliable. In a model in
which some attributes are measured by a single total
test score, we can take scores from items or subtests
of the test as multiple indicators, if we wish to allow
for errors of measurement. In the rare case where a
single item is the only measure of an attribute, there
does not yet seem to be any treatment available for its
error of measurement.
Just 12 of the 41 studies report all parameters,
whereas 20 omit both error and disturbance parameters, 2 omit error variances, 3 omit disturbance variances, 2 give only the directed arc coefficients, and 2
omit parameter estimates in the measurement part of
the model. Even in a fully standardized model, the
reader would find it difficult to construct the error
variances as unit complements of functions of the
other parameters. Generally, we would wish to verify
that the unique variances are not close to zero, corresponding to an improper (Heywood) solution, as
would also be indicated by large standard errors. The
disturbance variances also allow the reader to see
what proportions of variance of the endogenous variables are accounted for by the model (only 2 of the 41
reports explicitly addressed this question).
Standard errors of some parameters were reported
in only 5 of the 41 studies. Of these 5, none reported
all standard errors. Usually, standard errors of unique
and disturbance variances are not reported. Generally,
there appears to be nothing preventing such a report.
76
MCDONALD AND HO
Standard errors can be included in tables of the parameter values or, following one convention, put in
parentheses attached to the parameters in a path diagram. However, unless the scaling problem is solved,
either by the constrained minimization method of
Browne and Du Toit (1992) or by the reparameterization method of McDonald et al. (1993), standard errors are not available for the standardized solution. If
the standardized parameters are obtained by rescaling,
it would still be possible to examine and report the
standard errors of the unstandardized parameters, and
to describe statistical inferences based on these before
interpreting the corresponding standardized coefficients. This would follow the comparable procedure
in regression programs.
The obvious and conveniently implemented recommendation is to include all the parameters and their
standard errors in the research report. It is difficult to
see what could prevent this.
The parameters (and standard errors if supplied)
can, with equal convenience and approximately equal
space on the page, be presented in tables or attached
to directed and nondirected arcs in path diagrams.
Path diagrams originated with Sewell Wright (1921),
and have come into current SEM practice rather informally and haphazardly. There is at least some
agreement that the network of relationships in a path
model is most easily appreciated in the form of a path
diagram, a picture of the graph. Formal directed acyclic graph (DAG) theory (see Pearl, 1988, 2000;
Scheines, Spirtes, Glymour, Meek, & Richardson,
1998) is becoming more widely recognized by SEM
researchers. Recognition of this work should soon
lead to a greater formality in the pictorial representation of the hybrid graphs (graphs containing both directed and nondirected arcs) that structure SEMs, and
to their direct use in determining the precise form of
the constraints on the covariances.
We comment, and invite possible disagreement,
that in the common case where the measurement
model is a standard independent cluster model, there
is little to be gained from including the relations between common factors or latent variables and their
indicators in the path diagram. A good case can be
made for presenting the measurement model in the
tabular form of the older factor analysis tradition, and
drawing an uncluttered path diagram containing only
the latent variables (or any observable variables included in the path model), with their arcs and disturbance terms. This allows a much clearer appreciation
of the relations postulated in the path model than if the
77
tages and disadvantages, and there should be no ambiguity from the readers viewpoint.
78
MCDONALD AND HO
Conclusion
We claim no special authority to offer a code of
practice for the reporting of SEM results. Recommendations, with varying degrees of confidence, have
79
References
Allison, S. (1987). Estimation of linear models with incomplete data. In C. C. Clogg (Ed.), Sociological methodology (pp. 71103). San Francisco: Jossey-Bass.
Amemiya, Y., & Anderson, T. W. (1990). Asymptotic chisquare tests for a large class of factor analysis models.
Annals of Statistics, 18, 14531463.
Anderson, J. C., & Gerbing, D. W. (1988). Structural equation modeling in practice: A review and recommended
two-step approach. Psychological Bulletin, 103, 411
423.
80
MCDONALD AND HO
formance of full information maximum likelihood estimation for missing data in structural equation models.
Structural Equation Modeling, 8, 430457.
Fisher, F. M. (1970). A correspondence principle for simultaneous equation models. Econometrica, 38, 7392.
Fornell, C., & Yi, Y.-J. (1992). Assumptions of the two-step
approach to latent variable modeling. Sociological Methods and Research, 20, 291320.
Fraser, C., & McDonald, R. P. (1988). COSAN: Covariance
structure analysis. Multivariate Behavioral Research, 23,
263265.
Gollob, H. F., & Reichardt, C. S. (1987). Taking account of
time lags in causal models. Child Development, 58, 80
92.
Graham, J. W., & Hofer, S. M. (2000). Multiple imputation
in multivariate research. In T. Little, K. U. Schnabel, & J.
Baumert (Eds.), Modeling longitudinal and multilevel
data: Practical issues, applied approaches and specific
examples (pp. 201218). Mahwah, NJ: Erlbaum.
Hartmann, W. M. (1992). The CALIS procedure: Extended
users guide. Cary, NC: SAS Institute.
Hayduk, L. A., & Glaser, D. N. (2000a). Jiving the four-step
waltzing around factor analysis, and other serious fun.
Structural Equation Modeling, 7, 135.
Hayduk, L. A., & Glaser, D. N. (2000b). Doing the fourstep, right 2-3, wrong 2-3: A brief reply to Mulaik and
Millsap; Bollen; Bentler; and Herting and Costner. Structural Equation Modeling, 7, 111123.
Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrika, 45, 153161.
Heckman, J. J., & Robb, R. (1986). Alternative methods for
solving the problem of selection bias in evaluating the
impact of treatments on outcomes. In H. Wainer (Ed.),
Drawing inference from self-selected samples (pp. 63
107). New York: Springer.
Herting, J. R., & Costner, H. L. (2000). Another perspective
on the proper number of factors and the appropriate
number of steps. Structural Equation Modeling, 7, 92
110.
Hoyle, R. H., & Panter, A. T. (1995). Writing about structural equation models. In R. H. Hoyle (Ed.), Structural
equation modeling: Concepts, issues, and applications
(pp. 158176). Thousand Oaks, CA: Sage.
Hu, L., & Bentler, P. M. (1995). Evaluating model fit. In
R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues and applications (pp. 7699). Thousand
Oaks, CA: Sage.
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 155.
81
82
MCDONALD AND HO