You are on page 1of 17

Introduction to Structural Equation Modeling

(Path Analysis)
SGIM Precourse PA08 May 2005
Jeffrey L. Jackson, MD MPH
Kent Dezee, MD MPH
Kevin Douglas, MD
William Shimeall, MD MPH

Traditional multivariate modeling (linear regression, ANOVA, Poisson regression,


logistic regression, proportional hazard modeling) is useful for examining direct relationships
between independent and dependent variables. All share a common format:

Dependent Variable = Independent variable1 + Independent Variable2 + Independent Variable3

The net result of such modeling is to examine relationships in the following format

Independent
Variable1

Independent Dependent
Variable2 Variable

Independent
Variable3

However, real life may not be so parsimonious, relationships between various variables may be
much more complex, more “web-like,” for example:

Independent
Independent Variable6
Variable4 Independent
Variable1

Independent Dependent
Independent Variable2 Variable
Variable5
Independent
Variable3

In this example, Variables 1-3 and 6 have direct effects on the dependent variable, while
variables 4 and 5 have indirect effects, mediated by effects on Variables 1, 2 and 3. Variable 6

1
has both direct and indirect effects on the dependent variable. This "web" of relationships could
not be easily modeled with standard regression techniques. On the other hand, structural
equation modeling (SEM) readily allows one to explore such complex interrelationships.
Structural equation modeling emerged in the mid-late 1980's in the social sciences arena
as a method of modeling complex relationships. There are several common types of structural
equation models:

1) Path analysis, also known as causal modeling, focuses on examining the web of relationships
among measured variables.

Functional
Status
Satisfaction
Mental
Disorders
Symptom
Severity

This is a typical example of a path analysis, an analysis in which one explores the web of
relationships between the presence of mental disorders, functional status, symptom severity and
satisfaction.

2) Confirmatory factor analysis (CFA). Factor analysis comes in two flavors. Exploratory
factor analysis, in which one performs factor analysis in order to explore what factors may
underlie a particular set of measurements. This is probably the most commonly performed type
of factor analysis. Confirmatory factor analysis tests whether a pre-existing theoretical model
underlies a particular set of observations. CFA is a form of structural equation modeling.

Psychosocial orientation

Patient-centeredness Orientation

Warmth of encounter

When one builds a confirmatory factor analysis, the observed variables are also known as
"indicator" variables, because they load together on the underlying theoretical construct. In this
case the factors, duration of visit, patient-centeredness of the visit, warmth of the encounter load
together and are indicators of the underlying non-observed construct, termed "orientation" in this
model.

2
3) Structural Regression Models. These are models which elucidate the underlying factors or
constructs of observed variables and model the relationship between these theoretical constructs.

X1 X2 X3 Y1 Y2 Y3

Factor Factor
X Y

Factor
Z

Z1 Z2 Z3

In this example, there are 9 indicator variables that load on 3 factors. With CFA or with
traditional factor analysis, one can determine that there are 3 underlying factors for these 9
variables, but one cannot determine if there is a relationship between the underlying factors.
With SEM, one can model not only the relationship between measured variables, but also the
relationship between unmeasured, hypothetical constructs. With these models, the focus is
generally not on hypothesis testing, at least not of a particular relationship within the variables
(for example are men more likely to have hypertension than women), instead hypothesis testing
focuses on whether the overall web of relationship adequately describes the data. Specifically,
hypothesis testing examines the fit of the model to the observed model. SEM modeling requires a
perspective shift, from one that focuses on testing specific variable outcomes, to one that looks at
a more holistic picture.

SEM Variables
Endogenous versus Exogenous (Ø Dependent versus Independent)
With structural equation models, the concept of a dependent variable becomes blurred. In our
original example, variables 1,2 and 3 were dependent on variables 4,5 and 6, while the
“dependent” variable depended directly on variables 1-3 and 6 and indirectly on variables 4 and
5. In SEM, the terms independent and dependent variables are abandoned; instead variables are
referred to as “exogenous” or “endogenous.” Endogenous variables are those modeled as
dependent on other variables, while exogenous are not dependent on other variables. In this case
variables 1-3 and the dependent variable are endogenous, variables 4-6 are exogenous. Variables
with arrows solely going away from them are exogenous, those with any pointed toward it are
endogenous.
Latent versus Observed variables

3
With traditional regression modeling, all the variables are ones that have been measured. They
are "observed." SEM models commonly include variables that have not been directly measured
and whose existence is deduced on the relationship of a set of measured variables. These
variables are referred to, in SEM, as latent variables. Factor analysis provides a common
example of a latent variable. In factor analysis, one analyzes the relationship among a set of
variables to see what, if any, underlying "factors" exist. If a factor analysis suggests, for
example, the existence of 3 factors, it is up to the analyst to try to figure out what those factors
mean. SEM allows analysts to not just determine what factors underlie a set of indicators, it is
also possible to examine the strength of the relationship between these theoretical constructs.
Unit of analysis
The unit of analysis in SEM is variance or covariance matrices. Most SEM programs
can analyze raw data or data entered as variance or covariance matrices. The covariance
between any two variables is the product of two variables’ standard deviation and their Pearson
correlation coefficient.

Let’s use a data set as an example. Hamilton Figure 1 Hamilton Data


examined the relationship between SAT scores, household
income and parental education level. His model was: Sat Income Education
899 14.345 12.7
Education 896 16.37 12.6
897 13.537 12.5
SAT 889 12.552 12.5
823 11.441 12.2
Income 857 12.757 12.7
860 11.799 12.4
890 10.683 12.5
889 14.112 12.5
888 14.573 12.6
His raw data had 21 subjects in it. The covariance matrix 925 13.144 12.6
upon which SEM models would be built is: 869 15.281 12.5
896 14.121 12.5
| sat income education 827 10.758 12.2
-------------+--------------------------- 908 11.583 12.7
sat | 1013.33 885 12.343 12.4
income | 23.9495 2.68972 887 12.729 12.3
education | 4.11572 .133444 .028143
790 10.075 12.1
868 12.636 12.4
Of course, this set of data could be modeled using linear 904 10.689 12.6
regression: 888 13.065 12.4

Source | SS df MS Number of obs = 21


-------------+------------------------------ F( 2, 18) = 13.69
Model | 12229.1104 2 6114.55522 Prob > F = 0.0002
Residual | 8037.46099 18 446.52561 R-squared = 0.6034
-------------+------------------------------ Adj R-squared = 0.5593
Total | 20266.5714 20 1013.32857 Root MSE = 21.131

------------------------------------------------------------------------------
sat | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
income | 2.155637 3.294535 0.65 0.521 -4.765924 9.077199
education | 136.0223 32.20799 4.22 0.001 68.35584 203.6888
_cons | -846.1063 383.0464 -2.21 0.040 -1650.857 -41.35576
------------------------------------------------------------------------------

4
Using a standard SEM modeling program (AMOS), one gets a similar solution:

382.74
Error
Education 136.02
r2=0.6
1
SAT
.13
2.16
Income

Standardized versus Unstandardized Output


Just as in other forms of regression, the results from SEM can be presented as standardized or
unstandardized estimates. In the above example, the unstandardized estimates of the relationship
between education, income and SAT scores would be interpreted to mean that every year
increase in the educational status of the parents yields a 136.02 increase in SAT score, while
every thousand dollar increase in income yields a 2.16 increase in SAT scores. When the
estimates are in easily understood units, unstandardized estimates are preferred. However, it is
often the case that the units of the included variables have no particular meaning, for example, a
6/10 rating of pain. In this situation, standardized estimates may be easier to interpret.
Standardized estimates are obtained by dividing the beta coefficient by the standard deviation of
that beta coefficient. For the Hamilton data, the standardized results are:

Education
.72 Error
r2=0.6
SAT
.49
.11
Income

Here, each 1 standard deviation increase in education produces a 0.72 standard deviation
increase in SAT score. There are several advantages to using standardized estimates. First, it
produces a comparable metric for all variables. With unstandardized estimates, the size of the
beta-coefficient will depend on the units of the variables. It may not be obvious which variable
has more influence. With standardized estimates, since all the estimates have been
"standardized" it is easy to see which is the more influential variable. For example, in the
Hamilton data, it is apparent from the standardized results that education has more influence in
this set of observation than income in predicting SAT scores. Another advantage to standardized
estimates is that there is a rule of thumb for the magnitude of the degree of effect, gauging how
important the relationship is. By convention, standardized coefficients that are greater than 0.8 is
considered large, 0.5 moderate, less than 0.2 small.

5
Error Terms
In SEM modeling, one explicitly models the uncertainty in the model. Each endogenous
term also has an error or "disturbance" term. This disturbance term represents not only the
uncertainty or inaccuracy of the measurement, but also represents all the unknown variables not
measured in this particular model. For example, in the Hamilton data, the error term for SAT
score represents not only inaccuracy in the measurement of SAT, but all the variance in SAT not
explained by the two variables in the model, education and income. In a sense, this error term
represents not only all the measurement error, but also all the as yet unmeasured variables that
comprise this endogenous variable.

SEM Notation
A word about standard SEM notation. “Observed” variables are represented by rectangles. An
observed variable is a variable that appears explicitly in the data set, one that has been measured.
Direct effects from one variable to another is represented by an arrow, the direction of the arrow
showing the direction of the effect. It is possible, in SEM to have a bi-directional effect,
represented by two arrows (though these are difficult to model and should be avoided, if at all
possible). Some literature refers to this as a feedback loop. A two-headed curved arrow
represents a correlation or covariance between two variables In such instances two variables are
assumed to covary, but there is no more specific hypothesis about how this correlation arises,
that is the direction of the relationship. Some literature refers to this as an unanalyzed
association. In the Hamilton data, there was assumed to be a relationship between education and
income, but the analysts weren't particularly interested in exploring the specific direction of the
relationship, and chose to just analyze this relationship as a "covariance." Latent, or unobserved
variables are represented by circles or ellipses. Endogenous variables have an unobserved error
term (some literature refers to these as disturbance rather than error terms).

6
Confirmatory Factor Analysis
For example, suppose you are studying children playing and measure behaviors that you
label as smiling, laughing, contentment, satisfaction, crying, hitting, biting and yelling. One can
easily imagine such observed variables as representing underlying constructs, such as happiness
and sadness. For the above data, using standard SEM notation, the model might look like:

Smiling Laughing Contentment Satisfaction

Happiness

Sadness

Crying Hitting Biting

The SEM model is an example of “confirmatory factor analysis” as the model seeks to confirm
whether a theoretical underlying construct is reflected in the observed data. Note the direction of
the arrows; the underlying latent construct is believed to “cause” the observed variables. Also
note that each endogenous variable has an associated error or disturbance term.

In traditional factor analysis, the analysis can be done either apriori and post hoc. A priori, one
may have a theoretical model in mind and uses factor analysis to confirm that model. On the
other hand, one may not have a clear model in mind and may do a factor analysis to see what
relationships emerge, this form of factor analysis is known as exploratory factor analysis. SEM
should be done as apriori modeling. It is designed to test hypothetical constructs. The analyst
builds the model and then runs the analysis. While it is possible to modify the model based on
the results, it is not possible to have SEM create the model for you. In SEM modeling, there is
no substitute for clear thinking and thorough understanding of the literature.

7
Recursive versus non recursive models
There are two basic kinds of SEM models, recursive and non recursive models.
Recursive models are much more stable than nonrecursive ones, if at all possible, recursive
models are preferred. Recursive models are also more straightforward to model. Recursive
models have 2 basic features, their error terms are uncorrelated and all causal effects are
unidirectional.

X Z

Y W

The above is an example of a recursive model. X and Y are exogenous variables; W and Z are
endogenous. All endogenous variables, in SEM have an error term, represented by the two
ellipses. The curved arrows represent covariance, that is, the modelist believes there is a
relationship between X and Y, but is uncertain of its direction.

Any model with a feedback loop between endogenous variables is considered nonrecursive.

X Z

Y W

This model would be considered nonrecursive as the two endogenous variables affect each other,
that is, there is a feedback loop between the two, one can get from W back to Z in the diagram.
One can also be nonrecursive if there is a loop between two endogenous variables and the error
terms for the two endogenous variables are covariant, that is:

X Z

Y W

This model would be considered nonrecursive. Sometimes figuring out whether models are
recursive or nonrecursive can be difficult. Fortunately, SEM modeling programs will inform you
whether the model is recursive or nonrecursive. If it emerges as nonrecursive, it is well worth
spending time contemplating on whether it is essential for the model to be laid out
nonrecursively, if at all possible, modify the model to make it recursive.

8
Identification
An important notion in SEM is identification. Identification is the concept that a unique
solution can be achieved for the model. For example, suppose you are asked to solve the
equation: X + Y = 37. Without additional information, it is impossible to obtain a unique
solution to this equation, X could equal 30 (with a Y of 7) or it could be 28 (with a Y of 9), etc.
In fact, there are an infinite number of solutions to this problem. If one is also given the equation
2X +7Y=48, the two equations can now be solved. In algebra, one has to have as many
equations as there are unknown variables in order to achieve a unique solution. This analogy
holds for SEM models. In SEM some parameters are calculable directly from the data, for
example the covariance matrix. Other parameters have to be estimated, for example each path
coefficient. With each error term, both the path coefficient and the variance of the error term
have to be estimated, since neither are observed variables. If one has other latent variables, then
the coefficients, covariance and variances of each these also has to be estimated. It is easy to
wind up asking the computer to estimate more parameters than there is enough information to
calculate. When this happens, the model is said to be "unidentified." There are several things
one can do to help ensure the model is identified. First, recursive models are nearly always
identified, nonrecursive models are frequently unidentified. Building recursive models will help
reduce the possibility of getting insolvable problems. Another necessity is to specify either the
variance or the regression weight for each of the error terms that are created for endogenous
variables. Either the regression weight or the variance (or covariance) must be specified,
otherwise it is equivalent to asking the program to calculate a unique solution for 2 variables
(regression weight and variance) with only a single equation. Most modelists set the regression
coefficient of the error term to be 1, since usually one is more interested in the size of the
variance in the error term than in the weight of the error term on the endogenous variable.
Finally when one creates a latent models with several indicator variables from a latent variable,
the analyst sets the regression weight of one of the paths to a set size, usually 1. It is
recommended that one set the strongest path, or the strongest relationship among the set of
indicator variables to 1. In essence this is setting the rest of the regression coefficients to be
some multiple of the strength of the relationship between that indicator variable and the
underlying construct.

Observations in SEM
Another important concept in SEM is the number of “observations.” In traditional
regression analysis, the number of observations is the sample size used in the analysis, this is
because the program uses the raw data to produce its model. Each line of data used in the
analysis is 1 observation. So if one has 500 lines of data and the regression analysis uses 487 of
them, then the analysis is based on 487 observations. However, SEM doesn't use the raw data,
instead the variance (or covariance) matrix is used. The number of observations in SEM is based
on the number of covariances in the matrix rather than the number of subjects in the dataset. The
number of observations or covariances in SEM can be calculated:
v(v + 1)
observations = where v is the number of variables in the model.
2
So for the Hamilton example, there were 3 variables, 3*4/2=6, yielding 6 possible observations.
If there were 4 variables, there would be 4*5/2 or 10 observations.

9
Consider a data set with 4 variables, SAT, Education, Income and New:

| sat income educat~n new


-------------+------------------------------------
sat | 1013.33
income | 23.9495 2.68972
education | 4.11572 .133444 .028143
new | 1037.28 26.6392 4.24916 1063.92

There are 10 total covariances in this matrix (including covariance of a variable on itself).

The number of observations is important because in SEM, the number of degrees of freedom is
the difference between the number of observations and the number of parameters estimated in
the model. The total number of estimated parameters is the total number of variances and
covariances of exogenous variables (either observed or unobserved) and direct effects of
observed variables on endogenous variables. Consider this example:

Education
SAT Error2
1
Income

1 New job
1
Error

There are two exogenous, two error terms and two endogenous variables in this model. The two
observed exogenous variables are education and new job and the unobserved exogenous
variables are the two error terms. The variance for all 4 of these variables has to be estimated
from the data, yielding 4 parameters. In addition, there are 6 direct effects from these variables,
but since the estimate of the two error terms has been set to 1, there are 4 other direct effects that
have to be estimated. (In order to be identified, the direct effect of the two error terms has been
set to 1, alternatively one could set the variances to some set value and estimate the regression
weight from the error terms to the endogenous variables, but one has to set one or the other,
otherwise the model will not be identified). For this model then, there are a total of 8 parameters
that will need to be estimated. There are 4 observed variables in the model, so there are a total of
10 observations. The degrees of freedom is the difference between the number of observations
and the number of parameters estimated, in this model, 10-8, so this model has 2 degrees of
freedom. If there were no direct effect between education and income, there would have been 7
parameters and 3 degree of freedom. This model is recursive. Education has both a direct and
indirect effect on SAT score, while the new job variable has a direct effect on income and only
an indirect effect on SAT score.

10
In order to increase the number of degrees of freedom in an SEM model, it is not helpful
to increase the sample size. It is necessary to increase the number of variables or to decrease the
number of parameters modeled in any given example. That is not to say that sample size is not
important. SEM modeling is a “large sample” method, it is recommended that samples include
at least 200 cases, with 20 cases for every parameter modeled; at a bare minimum there should
be 10 cases for every parameter. So the above mode, with 8 parameters, one would want at least
160 cases in the data set.
SEM Variable Preparation
Before embarking on SEM modeling, one must spend some time examining and preparing the
variables for analysis. Not all variables may be modeled in SEM.
Types of Variables in SEM
All endogenous variables need to be continuous, or at least ordinal scaled and should be
normally distributed. This is because most of the hypothesis testing regarding goodness of fit are
based on assumptions of normality. Exogenous variables may be either continuous, ordinal
scaled, or dichotomous. It is recommended that ordinal-scaled variables should have at least 5,
and preferably more than 7, categories. Endogenous variables that are not normally distributed
must be transformed prior to SEM modeling. Recently there have been some methods developed
to help deal with non normal distributions, including asymptotic, distribution free methods of
modeling and bootstrap methods.
Missing Data
SEM will not run with missing variables. Most statistical analyses merely drop cases
with missing variables. There are four solutions to dealing with missing data.
1. Drop the cases with missing data. With only a few parameters, or if only a handful of
data elements are missing, this may be the best choice. With complex models, even a few
missing data disturbed over several variables can quickly add up to a large percentage of
data missing.
2. Substitute the mean for the missing data. This is a particularly good solution if the data
that is missing appears to be randomly missing. Randomly missing data could very likely
be well represented by the group averages for any particular missing data point.
However, if there appears to be a pattern to missing data, this may not be a good solution.
For example, consider a data set in which measures of psychological stress are being
measured at the same time that information on the presence of an underlying mental
disorder is collected. If it turns out that those patients who have an underlying mental
disorder are more likely to skip certain questions on their psychological stress, such
missing data is not random, it’s a reflection of their unique characteristic. In addition,
imputing to the mean will tend to decrease the variance in the dataset. Since it is the
variance which is the heart of SEM, decreasing this may significantly decrease one's
analytic capabilities.
3. Impute the data. Data imputation involves taking the measured variables and trying to
predict the value of missing variables. For example, suppose that functional status has
some missing data points. If other variables in the data set can predict functional status
(age, sex, mental disorders, medical comorbidities), then it would be more accurate to
replace missing data with this predicted value than to use the group mean. Only careful
scrutiny of the data and awareness of the pattern of missing data points can produce the
best method for dealing with missing data. Data imputation is only as good as the
predictive model. One should also be cautious in applying imputation methods to more

11
than a small percentage of your data. In a data set of 500 patients, imputing missing
functional status on a handful of patients may be reasonable. Caution should be used if
more than a small percentage of data points needs to be imputed.
4. Import the covariance matrix rather than the raw data. Most SEM programs can analyze
data that is imported as either raw data or as a covariance matrix. If one has only a very
few variables that are missing and are sure that the covariance pattern would not change
with whatever method of filling in that is employed, it may be simpler just to perform the
analysis based on the covariance matrix of the data one has in the dataset.

Steps in SEM

1. Specify the model.


2. Determine whether the model is identified. In practice this means creating error
terms for every endogenous variable and specifying the weight of the regression for
each error term and when using latent variables with more than one indicator, setting
the regression weight of at least one path equal to 1.
3. Analyze the model. This will require the use of SEM software (AMOS, LISREL,
EQS).
4. Evaluate model fit
5. Respecify the model based on the model’s fit.
6. Publish the results.

Caution should be used in model respecification. Most SEM software will make suggestions for
improving the model’s fit. Blindly modifying the model according to those suggestions without
a good theoretical justification will yield nonsensical, but well-fitted models. If the SEM
software suggests that the fit would be better with the addition of a covariance between variables
X and Y and that suggestion makes good theoretical sense you, then the modification should be
made. If there is no reasonable theoretical underpinning, such suggestions should be ignored.

Evaluating Model Fit


Once one has run the model, SEM software will print out an array of fit measures. Each fit
measure is designed to give information about how well your model fits the data in your dataset.
Unfortunately, there are dozens of fit indices described in the SEM literature. The fact that there
are so many is indicative of both the fact that none are perfect and the importance of model
fitting in SEM. Let's examine some of the more common, generally accepted fit measures. Most
SEM modelists will report several fit measures in their papers.

Discrepancy Function
The most basic fit index is the generalized likelihood ratio. Maximum likelihood estimation
minimizes the fitness function by deriving parameter estimates that yield predicted covariances
that are as close as possible to the observed values in a particular sample. In large samples, this
likelihood ratio can be interpreted, assuming the model is correct, as a Pearson Chi-square. The
degrees of freedom will be the difference between the number of observations and the number of
parameters the model must estimate as previously discussed. A just-identified model is one with

12
no degrees of freedom. An over-identified model is one with positive degrees of freedom. The
Chi-square compares whether the over-identified model is worse fitting than if it was just
identified. Suppose an SEM model with 5 degrees of freedom, yields a discrepancy function of
Χ2 = 16.37, this yields a p=0.01, that is the model with 5 degrees of freedom is significantly
worse than if it had 5 more parameters and had 0 degrees of freedom. A statistically significant
discrepancy value means the fit isn’t as good as it could be. However, the discrepancy function
has recently been criticized of being overly sensitive to small differences in fit between the
model and the data. In a way, it's a catch-22 for all SEM models. SEM models are a large-
sample technique. With large samples, even small deviations from well-fitting models will be
statistically significant. This is analogous to a statistically significant, but clinically meaningless
difference one often finds in large datasets. For example one recent analysis on a large DOD
dataset found a statistically significant difference (p<0.001) in potassium between patients
surviving and not surviving a myocardial infarction. However, the difference in potassium was
very small (4.32 vs 4.31), a clinically insignificant difference, found different only due to the
large number of subjects in the dataset.
Some researchers divide the Chi-square by the numbers of degrees of freedom. A rule of
thumb is that if this ratio is less than 2, it is considered well-fitted; it is considered acceptable if it
is less than 3 and definitely not acceptable if greater than 5. In this example, this value would be
16.37/5 or 3.3, still not an adequately fit model, but not as bad as a p=0.01 might suggest.

Jörskog-Sörbom Goodness of Fit Index (GFI)


The GFI is analogous to a squared multiple correlation; it indicates the proportion of the
observed covariance explained by the model covariance. Like a squared multiple correlation, it
varies from 0-1 with 1 being a perfect fit. A variant of this statistic is the Adjusted Goodness of
Fit (AGFI) index, which includes an adjustment for model complexity. This is done because
the more parameters included in any model, the greater the amount of variance explained. The
AGFI takes this into account by correcting downward the value of the GFI as the number of
parameters increases. The AGFI has not performed well in some computer simulations and is
less popular than the GFI. Values greater than 0.9 are considered well fitting.

Bentler-Bonett Normed Fit Index (NFI)


Bentler Comparative Fit Index (CFI)
Bentler-Bonett Non-Normed Fit Index (NNFI)
The NFI indicates the proportion in the improvement of the overall fit of the researcher’s
model relative to a null model, typically the “independence” model. The independence model is
one in which all variables are assumed to be uncorrelated. An NFI of .80 means that the overall
fit of the tested model is 80% better than that of an independence model, based on the sample
data. The CFI can be interpreted the same way, but is less affected by sample size. The NNFI
(formerly known as the Tucker-Lewis index) includes a correction for model complexity, much
like the AGFI. NFI,CFI,NNFI values greater than 0.9 are considered well-fitting.

Root Mean Square Error of Approximation (RMS or RMSEA)


This is a standardized summary of the average covariance residuals. Covariance
residuals are the difference s between the observed and model-implied covariances. When the fit
of the model is perfect, the SRMR equals zero. As the average discrepancy between the observed
and predicted covariances increases, so does the value of the RMSEA.

13
A value of the RMSEA of about .05 or less would indicate a close fit of the model in
relation to the degrees of freedom.

Akaike's Information Criterion (AIC)


Bozdogan's Consistent version of the AIC (CAIC)
Browne-Cudeck Criterion (BCC)
Bayes Information Criterion (BIC)
Expected Cross-validation Index (ECVI)
Both the AIC and the CAIC address the issue of parsimony in the assessment of model fit. The
AIC and CAIC are used in the comparison of two or more models, with smaller values
representing a better fit of the hypothesized model. The AIC is a modification of the standard
goodness of fit Chi-square that includes a penalty for complexity. The AIC provides a
quantitative method for model selection, whether or not models are hierarchal. AIC is defined
as: AIC = −2 log e ( L) + 2q , where loge (L) is the log-likelihood function and q is the number of
parameters in the model. Adding additional parameters (and increasing the complexity of a
model) will always improve the fit of a model; however, it may not improve the fit enough to
justify the added complexity. To use, the AIC is computed for each candidate model and the one
with the lowest AIC is selected. The AIC has the best literature supporting its use and is based
on strong underlying statistical reasoning. It has been subjected to numerous Monte Carlo
simulations and found to be well behaved. They reflect the extent to which parameter estimates
from the original sample will cross-validate in future samples. The CAIC is a modification of
the AIC that takes into account the sample size, it assigns greater penalty to model complexity
than either the AIC or BCC, but not as great as the BIC. The AIC, CAIC and BCC assumes that
no true model exists, but just tries to find the best one among those being considered. The BCC
was developed specifically for the analysis of Structural equations. There is some evidence that
the BCC may be superior to the other measures. The BCC imposes a slightly greater penalty for
model complexity than does the AIC. The BIC, assigns more penalty for complexity than the
AIC, BCC and CAIC and hence has a greater tendency to pick parsimonious models. The BIC
has been compared, in Monte Carlos simulations directly to the AIC and has been found to
perform comparably. The BIC assumes that it is in the set of candidate models and that the goal
of model selection is to find the true model. This requires that the sample size to be very large.

In most outputs, these measures are applied to three models---the "default" model, aka the model
that you have specified, the independence model (one in which it is assumed that there are no
correlations between any of the variables) and the saturated model, one in which there are as
many parameters as there are observations. In a sense, one is looking to see how bad your model
is compared with a "worst-case" scenario. The closer the AIC, CAID, BCC, BIC or ECVI are to
the saturated value and the further from the independence model, the better the fit. The more
common use is when one develops two reasonably fit models, these measures can be used to help
decide which model to use.

Hoelter's critical N.
This fit statistic focuses on the adequacy of sample size, rather than on model fit. Its purpose is
to estimate a sample size that would be sufficient to yield an adequate model fit for a Χ2 test. A
model is considered adequately fitting if the Hoelter's N is greater than 200.

14
Fitting your model--practical tips
After one has entered the model into an SEM software package and run the analysis, one then
looks at the fitness measures to see if your hypothesis about the nature of the web of
relationships is accurate. Sometimes one gets it right the first time, more often, one finds, on
looking over the array of fitness measures that the proposed model is not an adequate fit. One
then has several choices:
1) Abandon the whole project---not generally recommended
2) Respecify another apriori model, based on theory and your knowledge. It may be
that you had several possible models in mind, based on your knowledge of the
nature of the variables being analyzed and on previous work. One could say
Model A was incorrect, but let's test model B.
3) Modify the original model to make it better fitting. Once one does this, one has
strayed away from a purely confirmatory, apiori model testing, and one has to be
wary of "overfitting." It may be possible to tweak the model to make it fit this
particular set of data very well, but in so tweaking, one winds up with a model
that is so shaped around a particular set of observations as to make it unlikely to
be replicable or generalizable. Having issued this warning, this is probably the
most common approach, to examine where the model doesn't fit and see if one
can fix it so it is well-fitting.
So how does one go about doing this? First, look at the estimates for the regression coefficients
and the specified covariances. The ratio of the coefficient to the standard error is equivalent to a
z test for the significance of the relationship, with a p=0.05 cutoff of about 1.96. In examining
the regression weights and covariances in the model you originally specified, it is likely that one
will find several regression weights or covariances that are not statistically significant. Pruning
these covariances or paths back will make the model fit better. This is the usual first step in
model fit improvement. After pruning back the model, one reruns it to see if the model fit is now
adequate. In some cases it will be and no further "tweaking" will be necessary. In others, this
may not be enough. In this process of pruning, one should recognize that what one has is a new
model that is a subset of the previous one, in SEM parlance, the new model is a "nested" version
of the original one. Another term for this is that the models are "hierarchal," one is a subset of
the other. In this case, the difference in the Chi-square is a test for whether some important
information has been lost, with the degrees of freedom of this Chi-square equal to the number of
trimmed paths. For example, if the original model had a Chi-square of 187.3, you trim back two
paths that appear to not be significant and get a new Chi-square of 185.2, with 2 degrees of
freedom, this is not a statistically significant difference, so one has not lost important information
with these path eliminations.

What can one do if one prunes back non significant paths and covariances and the model fit is
still not adequate? Most SEM programs will issue a set of "modification indices" as part of it's
output. These are suggestions for changes in the model that would improve model fit. Usually
the output also includes some indication of how much change one would expect to see in the
Chi-square if one made this change. Usually this estimate is conservative and the actual amount
of Chi square change seen is much larger. Here the analyst is treading on particularly thin ice.
Adding a set of paths and covariances will very likely eventually produce a model that "fits" the
data. The SEM software will suggest all changes that will improve model fit, but those changes
may be nonsensical. The analyst should look over the modification indices with a cautious eye,

15
and should only considering making those changes that make sense from a biological or
theoretical perspective. The Modification Indices may suggest creating a new path from variable
X to Variable Z. If this modification makes sense (and often it does), then consider making the
change. Other suggestions are nonsense, from a theoretical perspective, no matter how well they
may improve model fit, and should be avoided. Here is an example of a set of modification
indices:
Modification Indices
--------------------
Covariances: M.I. Par Change
--------- ----------
symptoms <------> mdage 7.043 -1.953
serious0 <---> severity 6.403 0.132
sex <-------------> age 4.034 -0.806
sex <----------> stress 7.609 0.027
un0 <-------------> psy 6.886 0.014
expl0 <---------> mdage 5.677 -0.325
expl0 <-----------> un0 9.663 -0.019

Variances: M.I. Par Change


--------- ----------

Regression Weights: M.I. Par Change


--------- ----------
serious0 <--------- psy 5.739 0.093
sex <-------------- psy 8.412 0.118
sex <----------- stress 9.703 0.107
un0 <-------------- psy 8.769 0.078
expl0 <------------ un0 8.644 -0.168

Means: M.I. Par Change


--------- ----------

Intercepts: M.I. Par Change


--------- ----------

In this example, the first group of modification indices suggests adding covariance terms, the
second variances, the 3rd regression weights, the 4th means and the 5th intercepts. In this case,
only modifications in covariances and regression weights are being suggested. For example,
adding a covariance relationship between the terms symptoms and mdage, will reduce the Chi
square by 7.043, a 1.9% decrease. In this case, the suggested modification makes no intuitive
sense, it is hard to imagine how there could be a relationship between the number of symptoms a
patient reports on a survey prior to being seen by the clinician and the age of the clinician,
particularly since the assignment of patients to doctors in this dataset was random. While the
model would be better fitting with such an addition, it makes no sense and should not be done.
On the other hand, the next suggested modification is to add a covariance term between patient
serious illness worry and the severity of the presenting physical symptom. This is a relationship
that one could believe to be real, this would reduce the Chi square statistic by 6.4, worth doing.

16
Popular Fit Metrics
Fit Metric Target Comment
Chi Square P>0.05 With large sample, even
minor deviations from fit
may be statistically
significant
Chi Square/degrees of <2 excellent fit Compensates for sample
freedom 3-5 okay fit size
>5 poor fit
Jörskog Sorbom 1 is a perfect fit. The proportion of the
Goodness of Fit Index Goal is >0.9 observed covariance
(GFI) explained by the model
covariance
Jörskog Sorbom 1 is a perfect fit. Includes adjustment for
Adjusted Goodness of Goal is >0.9 model complexity
Fit Index (AGFI)
Bentler Comparative Fit Closer to 1 the Asks how well does the
Index (CFI) better the fit. Goal data fit better than a
is >0.9 model in which variables
have no relationship
(Independence Model)
Tucker-Lewis Index Closer to 1 the Includes adjustment for
better the fit. Goal model complexity
is >0.9
Root Mean Square Error <0.05 is a close Average difference
of Approximation fit. between observed
(RMSEA)
and model implied
covariances.
Hoelter’s n >200 is good fit estimate a sample size
that would be sufficient
to yield an adequate
model fit

17

You might also like