Ess Sem Koolitus 02-10-14

Introduction to
Structural Equation Modeling

with Amos
Dr. Llus Coromina (University of Girona, Spain)

Email: lluis.coromina@udg.edu
2nd and 3rd October 2014

Outline
Introduction
Basic concepts. Types of variables. Basic composition
Intuitive explanation of the basics of SEM
o Path analysis. The regression analysis model
o Indirect effects. Equations. Degrees of freedom. Specification errors
Measurement errors in regression models
Full SEM model
Confirmatory Factor Analysis (CFA)
o Scale Reliability and Validity of a Construct
SEM and modeling stages.
o Model specification
o Model identification
o Model estimation
o Fit diagnostics and model modification
Results and interpretation
Model modification
1
Introduction
To introduce models that relate variables measured with error.

To introduce Structural Equation Models with latent variables (SEM).
To learn all stages of fitting these models.
To become familiar with the Amos software.
To enable participants to critically read articles in which these models are applied.
2
History
SEM make it possible to:

o Fit linear relationships among a large number of variables. Possibly more than
one is dependent.
o Validate a questionnaire as a measurement instrument.
Quantify measurement error and prevent its biasing effect.
o Freely specify, constrain and test each possible relationship using theoretical
knowledge, testing hypotheses.
In their most recent and advanced versions, SEM enable researchers to:
o Analyze non-normal data.
o Treat missing values by maximum likelihood.
o Treat complex sample data.
3
History of models for the study of causality
Analysis of variance (1920-1930): decomposition of the variance of a dependent
variable in order to identify the part contributed by an explanatory variable. Control
of third variables (experimental design).
Macroeconometric models (1940-50): dependence analysis of non-experimental

data. All variables must be included in the model.
Path analysis (1920-70): analysis of correlations. Otherwise similar to econometric

models.
Factor analysis (1900-1970): analysis of correlations among multiple indicators of the

same variable. Measurement quality evaluation.
SEM (1970): Econometric models, path analysis and factor analysis are joined
together. Relationships among variables measured with error, on non-experimental
data from an interdependence analysis perspective.
4
History of models for the study of causality
SEM are nowadays very popular because they make it possible to (5 Cs, see Batista &
Coenders 2000):
Work with Constructs/factors/latent variables measured through indicators/observed
variables/manifest variables, and evaluate measurement quality.
Consider the true Complexity of phenomena, thus abandoning uni and bivariate
statistics.
Conjointly consider measurement and prediction, factor and path analysis, and thus
obtain estimates of relationships among variables that are free of measurement error
bias.
Introduce a Confirmatory perspective in statistical modelling. Prior to estimation, the
researcher must specify a model according to theory.
Decompose observed Covariances, and not only variances, from an interdependence
analysis perspective.
5
Basic Concepts
Latent variables (theoretical concepts that cannot be observed directly) =

unobserved = unmeasured
Observed variables (indicators of the underlying construct which they are

presumed to represent)= manifest = measured
6
Basic Concepts
Exogenous (Independent) vs Endogenous (dependent) latent variables.
F1 causes F2
Changes in the values of the exogenous variables are not explaine by the model.
Rather, they are considered to be influenced by other factors external to the model
(background variabes such as gender, age, etc.).
Fluctuations in the endogenous variable is said to be explained by the model because
all latent variables that nfluence them are included in the model specification.
7
Statistical Modeling
Models explain how the observed and latent variables are related to one another.
Diagram
Equations
Specification: Model based on researchers knowledge of the related theory

Testing on sample data
Goodness of fit between the hypothesized model and sample data. Testing how well
the observed data fit the restricted structure.
Observed data Hypothesized model = Residual
DATA = MODEL + RESIDUAL
8
Types of variables
Observed variables
Unobserved latent factors
Measurement error associated with an observed variable

Ei =reflects on their adequacy in measuring the related unobserved
(underlying) factors.
Residual error (disturbance) in the prediction of an unobserved factor
9
Covariance or correlation
Path coefficient for regression of one factor onto another factor.

Direct relationship
Path coefficient for regression of an observed variable onto an

unobserved latent variable (or factor).
Direct relationship
Spurious relationship: both have a common cause
10
Indirect relationship: both are related by an intervening variable v3
Joint effect. The difference between Spurious and Indirect. If is that in

the latter v1 and v3 are both exogenous so that it is not clear if v3
contributes to the covariance between v1 and v2 through an indirect or
spurious mechanism.
11
Factor analytic model
Factor Analysis: analisis of covariances among observed variables in order to get

information of the underlying latent factors.
EFA: Exploratory Factor Analysis. EX: design of a new instrument of measure
satisfaction with life.
CFA: Confirmatory Factor Analysis. Measurement model in Structural Equation
Modeling (SEM). EX: Knowledge of the theory. Hypotesis testing.
Factor loadings: Regression paths from the factors to the observed variables.
12
Full latent variable model
Allows specification of regression structure among latent variables.

Testing of the hypothesis of the impact of one latent construct on another in the
modeling of causal direction.
Full model = measurement model + Structural Model
Recursive Full model: direction of cause from one direction only

Nonrecursive Full model: allows for reciprocal or feedback effects.
13
Example: European Social Survey (ESS)
Estonian data from ESS.
Year: 2012
Ppltrst= Most people can be trusted or you can't be too careful (0= You can't be too
careful; 10= Most people can be trusted)
Pplfair= Most people try to take advantage of you, or try to be fair (0= Most people try
to take advantage of me; 10= Most people try to be fair)
Pplhlp= Most of the time people helpful or mostly looking out for themselves (0=
People mostly look out for themselves; 10=People mostly try to be helpful)
Trstprl= Trust in country's parliament (0= Not trust at all; 10: Complete trust)
Trstplt= Trust in politicians (0= Not trust at all; 10: Complete trust)
Trstlgl= Trust in the legal system (0= Not trust at all; 10: Complete trust)
14
Sample Covariances
ppltrst pplfair pplhlp trstprl trstplt trstlgl
ppltrst 4,956
pplfair 2,698 4,980
pplhlp 2,055 2,225 5,093
trstprl 1,668 1,584 1,369 6,020
trstplt 1,404 1,300 1,103 3,977 5,080
trstlgl 1,705 1,602 1,355 4,153 3,370 6,204
Sample Correlations
ppltrst pplfair pplhlp trstprl trstplt trstlgl
ppltrst 1,000
pplfair ,543 1,000
pplhlp ,409 ,442 1,000
trstprl ,305 ,289 ,247 1,000
trstplt ,280 ,258 ,217 ,719 1,000
trstlgl ,308 ,288 ,241 ,680 ,600 1,000
15
The path model
* Term ei is measurement error (Random measurement error and Systematic error or

non-random)
* Residual (d1) terms represent error in the prediction of endogenous (Political Trust)
factors from exogenous (political Satisfaction) factors.
* All dependent variable have assigned an error (measurement error if it is an observed
variable and a disturbance if it is latent).
16
Basic composition
Measurement model: relations between observed and unobserved variables. CFA:
pattern by which each measure loads on a particular factor.
Structural model: Relations between unobserved variables.
A particular latent variable directly or indirectly influence (cause) changes in the values
of certain other latent variables in the model.
Structural Model Structural Model
Measurement Model
17
Examples and basic concepts. Simple linear regression model.
Introduction to interdependence analysis
The specification of a SEM consists in a set of assumptions regarding the behaviour of
the variables involved.
Substantive part: it requires translating verbal theories into equations.
Statistical part: it is needed for the eventual estimation and testing of the model. The
assumptions regard the distribution of the variables involved.
18
Substantive assumptions:
v2=21v1+d2
Linearity.
21 : effect by how much will the expected value of v2 increase following a unit increase
in v1?
Standardized 21: by how many standard deviations will the expected value of v2
increase following a standard deviation increase in v1?
d2 collects the effect of omitted explanatory variables, measurement error in v2 and the
random and unpredictable part of v2 (disturbance).
v1 is assumed to be free of measurement error.
19
Statistical assumptions regarding the joint distribution of the sources of variation:
v1 0 11 0
N ,
d2 0 0 22
Two additional parameters: the variances of v1 (11) and d2 (22).
Bivariate normal joint distribution of v1 and d2.
Variables are mean-centred.
Uncorrelation of v1 and d2 (inclusion of all relevant variables).
If this holds, the variance of v2 can be additively decomposed into explained variance
and disturbance variance. R2 is the explained percentage.
Equations exhaustively describe the joint distribution of v1 and v2 as a function of 3

parameters.
20
In order to derive the structural equation system =() we can apply path analysis :
11 12

21 22
For a model with k observed variables, the number of distinct elements in is (k+1)k/2.
= (11, 22, 21)
11 11
21 11 21
2
22 22 21 21 22 11 21
Determination coefficient R2=1-( 22/22)
21
It is possible to solve the system ()= as it contains an equal number of equations
(distinct elements of ) and unknowns (elements of ) exactly identified:

and estimate p 11, 22 , 21 : by solving
the system (p)=S:

11 11 11 s11
21 s

21 21 21 s
11 11
2

22 22 21
2
22 s 22 s 21
11 s11
We can estimate from a sample

covariance matrix:
s11 s12
S
s21 s22
22
Example
trsrprl =21*ppltrst +d2
(v2=trsrprl) can be explained by level of trust in others (v1=ppltrst):
s s
S 11 12
s 21 s 22 ( ) =( )
=
= / = 1.668/4.956= 0.34
= = 6.020 - (1.6682/4.956)= 5.46
23
21 is identical to the ordinary least squares estimation (dependence analysis).
In statistical analysis, a function of residuals (e.g. the sum of squares) is used as:
A criterion function to minimize during estimation.
A goodness of fit measure.
In a dependence analysis, a residual = v2 21v1 .

In an interdependence analysis residuals are differences between covariances fitted by
the model parameters (p) and sample covariances S.
They are arranged in the S-(p) residual matrix.
In an exactly identified model they are zero as S=(p) has a solution.
24
AMOS output for trsrprl =21*ppltrst +d2
Notes for Group (Group number 1)
The model is recursive.

Sample size = 2330
Variable Summary (Group number 1)
Your model contains the following variables (Group number 1)
Observed, endogenous variables

trstprl
Observed, exogenous variables
ppltrst
Unobserved, exogenous variables
d2
Variable counts (Group number 1)
Number of variables in your model: 3

Number of observed variables: 2
Number of unobserved variables: 1
Number of exogenous variables: 2
25
Number of endogenous variables: 1
Parameter Summary (Group number 1)
Weights Covariances Variances Means Intercepts Total

Fixed 1 0 0 0 0 1
Labeled 0 0 0 0 0 0
Unlabeled 1 0 2 0 0 3
Total 2 0 2 0 0 4
Sample Moments (Group number 1)
Sample Covariances (Group number 1)
ppltrst trstprl
ppltrst 4,956
trstprl 1,668 6,020
Sample Correlations (Group number 1)

ppltrst trstprl
ppltrst 1,000
trstprl ,305 1,000
26
Notes for Model (Default model)
Computation of degrees of freedom (Default model)
Number of distinct sample moments: 3

Number of distinct parameters to be estimated: 3
Degrees of freedom (3 - 3): 0
Result (Default model)
Minimum was achieved

Chi-square = ,000
Degrees of freedom = 0
Probability level cannot be computed
27
Estimates (Group number 1 - Default model)
Scalar Estimates (Group number 1 - Default model)

Critical Ratio= Dividing the
Maximum Likelihood Estimates regression weight estimate by
the estimate of its standard error
Regression Weights: (Group number 1 - Default model)
gives
Estimate S.E. C.R. P Label z = ,337/,022 = 15,479.
trstprl <--- ppltrst ,337 ,022 15,479 *** par_1 Sometimes it is called t-value.
Standardized Regression Weights: (Group number 1 - Default model)
Estimate
trstprl <--- ppltrst ,305
Variances: (Group number 1 - Default model)

The exogenous variance
Estimate S.E. C.R. P Label (ppltrst) is trivially equal to
ppltrst 4,956 ,145 34,125 *** par_2
the sample variance 4.956.
d2 5,458 ,160 34,125 *** par_3
28
Squared Multiple Correlations: (Group number 1 - Default model)
Estimate
trstprl ,093 ### R2 ###
Standardized Residual Covariances (Group number 1 - Default model)
ppltrst trstprl
ppltrst ,000 In an exactly identified model.
trstprl ,000 ,000 they are zero as S=(p) has a solution
29
Simple Regression with SPSS: trsrprl =21*ppltrst +d2
Model R R squared Adjusted R Squared

1 ,305 ,093 ,093
Coeficients
Model Non standardized Satandard t Sig.
coefficients ized coef.
B s.e. Beta
(Constant) 2,093 ,129 16,238 ,000
1 Most people can be trusted ,337 ,022 ,305 15,476 ,000
or you can't be too careful
a. Variable dependent: Trust in country's parliament
30
Model with two dependent variables and an indirect effect.
Identification, goodness of fit and specification errors
v2=21v1+d2 v3=32v2+d3
v1 0 11 0 0 is 33 and contains 43/2=6 non-

duplicated elements.
d 2 N 0 , 0 22 0 has 5 elements (11, 22, 33, 21, 32).
d 0 0 0
3 33 The difference is the number of degrees of
freedom (df) of the model.
31
Structural equation system:
11 11

21 11 21
22 22 21 21

31 11 21 32
32 22 32

33 33 32 32
EXERCISE: Derive Equation using path analysis.
32
AMOS OUTPUT
Notes for Group (Group number 1)

Sample size = 2330

trstprl
stfeco
ppltrst
d2
d3
33

trstprl
stfeco
ppltrst
d2
d3

34

Fixed 2 0 0 0 0 2
Labeled 0 0 0 0 0 0
Total 4 0 3 0 0 7
ppltrst trstprl stfeco

ppltrst 4,956
trstprl 1,668 6,020
stfeco 1,382 3,007 4,885

ppltrst 1,000
trstprl ,305 1,000
stfeco ,281 ,554 1,000
35


Chi-square = 46,528
Probability level = ,000
Estimates (Group number 1 - Default model) Maximum Likelihood Estimates
Estimate S.E. C.R. P Label

trstprl <--- ppltrst ,337 ,022 15,479 *** par_1
stfeco <--- trstprl ,499 ,016 32,156 *** par_2
36
Estimate
trstprl <--- ppltrst ,305
stfeco <--- trstprl ,554

ppltrst 4,956 ,145 34,125 *** par_3
d2 5,458 ,160 34,125 *** par_4
d3 3,383 ,099 34,125 *** par_5
Squared Multiple Correlations: (Group number 1 - Default model) ### R2 ###
Estimate
trstprl ,093
stfeco ,307

ppltrst ,000
trstprl ,000 ,000
stfeco 5,303 ,000 ,000
37
Degrees of freedom introduce restrictions in the covariance space.
From equation: Implies:
11 11 31 11 2132 2132

21 31 21 32
11 21
22 22 21 21 22

31 11 21 32
32 22 32

33 33 32 32
This derives from many explicit or implicit restrictions of the model.

The existence of degrees of freedom implies higher parsimony. It is a true model in the
scientific sense, which is a simplification of reality.
38
The existence of degrees of freedom affects estimation.
In general, no p vector of estimates will exactly satisfy (p)=S.
Estimation consists in finding a p vector that leads to an S-(p) matrix with small values.
A function of all elements in S-(p)) called fit function is minimized (Fmin)
The existence of degrees of freedom makes it possible to test the model fit. A model
with df=0 leads to a p vector that always fulfils (p)=S or S-(p)=0 and thus perfectly
fits any data set.
In a correct model with df>0 ()= in the population and (p)S in the sample. If S-
(p), contains large values, we can say that some of the restrictions are false.
If assumptions are fulfilled and under H0 (null hypothesis: the model contains all
necessary parameters), a transformation of the minimum value of the fit function
follows a 2, which makes it possible to test the model restrictions (significance of
omitted parameters). Note that standard testing procedures in statistical modelling
(e.g. t-values) test the parameters which are present in the model.
39
Specification errors
Errors such as the omission of important explanatory variables, the omission of model
parameters, or the inclusion of wrong restrictions are known as specification errors.
Specification errors are frequent. In general, a specification error can bias any
parameter estimate.
If the model is incorrect because v3 receives a direct effect from v1:
v2=21v1+d2
v3=31v1+32v2+d3
and we apply path analysis, then we observe that the new parameter affects 31 y 33:
40
11 11

21 11 21
22 22 21 21

31 11 21 32 11 31
32 22 32

33 33 32 32 31 31
If we fit the model in to the covariances in the equation, we find 31 to be affected by
the absent parameter but fitted only by the present parameters and . and
31 21 32 21
32 will be biased.
41
Attempts must be made to detect specification errors by all means, both statistical and
theoretical:
Specification errors are undetectable in any model with df=0. They are also
undetectable if they involve variables that are NOT in the model.
It can happen that many models with different interpretations have a similarly good
fit, even an exactly equal fit (equivalent models).
The following model has a completely different causal interpretation:

v1=12v2+d1 v2=23v3+d2
d1 0 11 0 0

d 2 N 0 , 0 22 0
v 0 0 0 33
3
EXERCISE: Derive the system for this model and (equivalent to the previous model).
42
If we estimate a general model:
v1=12v2+d1 22 33
v2=21v1+23v3+d2 11
v3=32v2+d3
d1 0 11 0 0

d 2 N 0 , 0 22 0
d 0 0 0
3 33
then the parameter vector includes 7 elements =( 11, 22, 33, 12, 21, 23, 32) versus
6 (3*4/2) equations: infinite number of solutions (underidentified model).
43
Identification of the model
Degrees of freedom (df)= elements matrix S - parameters to be estimated =

[(p)(p+1)]/2 - parameters
Just identified model (df=0): number of data variances and covariances equals the
number of parameters. The model yields an unique solution for all parameters, but
scientifically it is not interesting because without degrees of freedom it never can be
rejected. No goodness of fit of the model is possible.
Over identified model (df>0): It allows to reject the model. It allows to analyze the
discrepancy between S and (p) , thereby rendering it of scientific use. The aim in SEM
then is specify over identified models
Under identified model: infinite number of solutions. No useful at all
44
Simple regression model with errors in the explanatory variable.
Introduction to models with measurement error
The observed explanatory variable (v1) is measured with error (e1). The unobservable
error-free value f1 is called factor or latent variable.
f2 is observed because e2 is for the moment assumed to be zero.
Two equation types:
Relating factors to one another:
f2=21f1+d2
11 22
11 Relating factors to observed variables or
21 indicators:
v1=f1+e1
v2=f2
45
Assumptions:
Measurement errors are uncorrelated with factors (as in factor analysis).
Disturbances are uncorrelated with the explanatory factor (as in regression).
f1 0 11 0 0

e1 N 0 , 0 11 0
d 0 0 0
2 22
These assumptions make it possible to decompose the variance of observed variables

into true score variance (explained by factors) and measurement error variance. R2 is
called measurement quality and is represented as .
12 11 11 11 11
1 1
11 11 11
46
The structural equations become:
11 11 11
21 11 21
2
22 22 11 21
Underidentified model: 4 parameters (11, 11, 21, 22) and three variances and
covariances (only those of observed variables count).
The OLS estimator assumes that 11=0, which is a specification error and leads to bias.
The probability limit of the OLS estimator is:
s 21 21 11 21 11 11 21
21 1 21 and is thus biased unless =1.
s11 11 11 11 1
47
Simple linear regression model with multiple indicators
The solution to measurement error bias in SEM involves the use of multiple indicators,
at least of the explanatory latent variables.
The equations relating factors to indicators become:
f2=21f1+d2
v1=1* f1+e1
v2=f2
v3=31*f1+e3
48
The equation includes a loading 31 (L31) which relates the scales of f1 and v3:
The researcher must fix the latent variable scale, usually by anchoring it to the
measurement units of an indicator whose equals 1.
Standardized instead of raw loadings are usually interpreted. If there is only one
factor per indicator, they lie within -1 and +1 and equal the square root of .
New assumption of uncorrelated measurement errors of different indicators:
f1 0 11 0 0 0

e1 0 0 11 0 0
e N 0 , 0 0 33 0
3
d 0 0 0 0 22
2
49
Addition of v3 in the structural equations. This is an exactly identified model, all of
whose parameters can be solved, even those related to unobservable variables. The
extent to which multiple indicators of the same construct converge (correlate) provides
information to estimate the parameters.
11 12 11 11 11 21 31 32

21 11 211 31 31 11
22 22 11 212 21 21 11

31 11311 11 11 11
32 11 21311 33 33 1131
2

33 11231 33 22 22 11 212
50
Applied example
f2=v2=trstprl from Social Trust f1=SocT, measured by its two indicators (v1=ppltrst and v3=pplhlp):
ppltrst = 1* SocT+ e1
pplhlp = 31* SocT+e3
trstprl=trstprl
trstprl=21 * SocT+d2
51
The equivalence between the observed and latent dependent variable makes it possible
to simplify the path and equations as:
22
11
ppltrst = 1* SocT+ e1 1
21
pplhlp = 31* SocT+e3
1
trstprl= 21 * SocT+d2 33 31
We define a latent variable called Social Trust, measured by ppltrst and pplhlp. The
loading of ppltrst (first indicator) is constrained to 1 in order to fix the scale of the latent
variable. Each indicator automatically receives a (error variance, ei) parameter.
The regression is of trstprl (observed, dependent) on Social Trust (latent, explanatory).

This automatically defines a (regression weight), a (variance of independent
variable) and a (disturbance term) parameter.
52
AMOS OUTPUT
53
Sample size = 2330

ppltrst
pplhlp
trstprl

SocialTrust
e3
e1
d2

54

Fixed 4 0 0 0 0 4
Labeled 0 0 0 0 0 0
Total 6 0 4 0 0 10
trstprl pplhlp ppltrst

trstprl 6,020
pplhlp 1,369 5,093
ppltrst 1,668 2,055 4,956
trstprl pplhlp ppltrst

trstprl 1,000
pplhlp ,247 1,000
ppltrst ,305 ,409 1,000
55


Chi-square = ,000
Probability level cannot be computed
56
Scalar Estimates (Group number 1 - Default model) Maximum Likelihood Estimates

ppltrst <--- SocialTrust 1,000 fixed at 1
pplhlp <--- SocialTrust ,821 ,069 11,972 *** par_1 free 31
trstprl <--- SocialTrust ,666 ,056 11,954 *** par_2
Standardized Regression Weights:

Estimate
ppltrst <--- SocialTrust ,711
pplhlp <--- SocialTrust ,575
trstprl <--- SocialTrust ,430
Estimate We can compute R2

trstprl ,185 0,4752 = 0,185
pplhlp ,331 0,5752 = 0,331
ppltrst ,505 0,7112 = 0.505
57

SocialTrust 2,505 ,239 10,496 *** par_3 11
e3 3,407 ,169 20,161 *** par_4 33
e1 2,452 ,215 11,410 *** par_5 11
d2 4,909 ,170 28,942 *** par_6 22
Matrices (Group number 1 - Default model)
Implied (for all variables) Covariances (Group number 1 - Default model)
SocialTrust trstprl pplhlp ppltrst

SocialTrust 2,505
trstprl 1,668 6,020
pplhlp 2,055 1,369 5,093
ppltrst 2,505 1,668 2,055 4,956
Implied (for all variables) Correlations (Group number 1 - Default model)

SocialTrust trstprl pplhlp ppltrst
SocialTrust 1,000
trstprl ,430 1,000
pplhlp ,575 ,247 1,000
ppltrst ,711 ,305 ,409 1,000
58
Full SEM model
Example:
Independent variables (8): Dependent variables (7):
6 errors: e1, e2, e3, e4, e5, e6 6 represent observed variables: ppltrst;
1 disturbance: d1 pplfair; pplhlp; trstprl; trstplt; trstlgl
1 latent variable: SocialTrust 1 represents an unobserved variable (or
factor Political Trust).
59
Definition of the model:
Model Equations
Political Trust = ? * SocialTrust + d1 Structural model
ppltrst = 1* SocialTrust + e1
pplfair = ? * SocialTrust + e2 Social Trust
pplhlp = ? * SocialTrust + e3 measurement scale model
trstprl = 1*Political Trust + e4
trstplt = ? * Political Trust + e5 Political trust
trstlgl =? * Political Trust + e6 measurement scale model
Variances of independent variables

e1 = ? ; e2 = ?; e3 = ?; e4=?; e5 = ?; e6 = ?; SocialTrust = ?; d1 = ?
60
Rules for determining the model parameters
Rule 1: All the variances of the independent variables are parameters
Rule 2: All covariances between independent variables are parameters
Rule 3: All load factors between latent and its indicators are parameters
Rule 4: All regression coefficients between observed or latent variables are
parameters
Rule 5: (i) The variances of dependent variables, (ii) the covariance between
dependent variables and (iii) the covariance between dependent and independent
variables, are never parameters (are explained by other parameters of the model)
Rule 6: For each latent variable must be set its metric:
For independent latent two ways:
Set its variance set to a constant (usually 1)
Fix a load factor () between latent and its factor (usually 1)
61
Determining the model parameters:
For the latent dependent only one way: fix a coefficient between it and one of the
observed variables to a constant (usually 1)
An equation for each variable (latent or observable) that receives a one-way arrow
(dependent variables) (7)
So many variances as independent variables (8)
So many covariances as two-way arrows [0]
62
Number of distinct sample moments: 21 = = (6 * 7/2)
8 variances of independent variables
4 coefficients of latent factors with indicators
1 regression coefficient
63
Determining the model parameters:
Adding a covariance between e5 and e6, we introduce a parameter, and we lose one
degree of freedom.

64
Confirmatory Factor Analysis CFA.
Introduction to reliability and validity assessment
11
2
11
21
22
12
33
32
42
44
65
This model does not contain equations relating factors to one another but only
covariances. All factors are exogenous. No or parameters, only , and .
At least three indicators are needed for models with one factor and two for models
with more factors.
In CFA models it is possible to standardize factors to unit variances instead of fixing a
loading to 1. Then the parameters are factor correlations.
For 2 factors and 2 indicators we have the following equations:
v1=11f1+e1 v2=21f1+e2
v3=32f2+e3 v4=42f2+e4
66
The model has df=1. 11=22=1:
11 111
2
11

21 11121
22 1221 22
f1 0 1 21 0 0 0 0
31 213211
f2 0 21 1 0 0 0 0
e 0 0 0 32 213221
0 0 0
1 N , 11

33 132 33
2
e2 0 0 0 0 22 0 0 41 214211
e
3 0 0 0 0 0 33 0 42 214221
e 0 0 0 0 0 0 1
4 44
43 42 32

44 142 44
2
67
The correlation between two indicators of the same factor depends on :
12 11 11 11 11
1 1
11 11 11
21 1121 11
2 2
21
21 1 2
11 22 11 11 21 22 11 11 21 22
2 2 2 2
and the correlation between two indicators of different factors is attenuated with
respect to the correlation between factors (effect of measurement error):
31 112132 11
2 2
32
31 21 21 1 3
11 33 2
11
11 33
2
32 11 11 32 33
2 2
A CFA model is likely to fit the data only if items of the same factor correlate highly and
higher than items of different factors. We advise researchers to carefully examine the
correlation matrix prior to fitting a CFA model.
68
Reliability for each item:
Trstprl 1= =0.735 trstlgl 2= =0.629
stfgov 3= =0.789 stfdem 4= ???
Correlations:
21 1 2 = =0.680
31 21 1 3 =0.862* =0.669
41 ????
32 ????
34 ????
42 ????
69
Amos Output

trstlgl
trstprl
stfdem
stfgov
e2
e1
e4
SATCNTRY
e3
PolTrust
10
Number of variables in your model:
70
stfgov stfdem trstprl trstlgl

stfgov 5,534
stfdem 3,792 5,182
trstprl 3,861 3,152 6,020
trstlgl 3,446 3,304 4,153 6,204
stfgov stfdem trstprl trstlgl

stfgov 1,000
stfdem ,708 1,000
trstprl ,669 ,564 1,000
trstlgl ,588 ,583 ,680 1,000

71
Scalar Estimates (Group number 1 - Default model)
Maximum Likelihood Estimates

stfgov <--- SATCNTRY 2,089 ,042 49,720 *** par_1
stfdem <--- SATCNTRY 1,815 ,042 43,216 *** par_3
trstlgl <--- PolTrust 1,976 ,046 42,608 *** par_4
trstprl <--- PolTrust 2,102 ,045 46,963 *** par_5
Estimate i
stfgov <--- SATCNTRY ,888
stfdem <--- SATCNTRY ,797
trstlgl <--- PolTrust ,793
trstprl <--- PolTrust ,857
72
Squared Multiple Correlations: 1-i2=i
Estimate
stfgov 0,789 1-0,789=0,211
stfdem 0,635 1-0,635=0,365
trstlgl 0,629 1-0,629=0,371
trstprl 0,735 1-0,735=0,265
Covariances: (Group number 1 - Default model)

SATCNTRY <--> PolTrust ,862 ,011 76,185 *** par_2
Correlations: (Group number 1 - Default model)
Estimate
The same value than covariance. This is because variance of latent
SATCNTRY <--> PolTrust ,862
factors is fixed to 1. 11=22=1
73

PolTrust 1,000
SATCNTRY 1,000
e2 2,300 ,098 23,458 *** par_6
e1 1,601 ,093 17,164 *** par_7
e4 1,887 ,079 23,765 *** par_8
e3 1,169 ,083 14,103 *** par_9
PolTrust SATCNTRY stfgov stfdem trstprl trstlgl

PolTrust 1,000
SATCNTRY ,862 1,000
stfgov ,766 ,888 1,000
stfdem ,688 ,797 ,708 1,000
trstprl ,857 ,739 ,656 ,589 1,000
trstlgl ,793 ,684 ,608 ,545 ,680 1,000
74
Purification of the measures
Total item correlation serves as a criterion for initial assessment and purification.
Various cut-off points are adopted:
0.30 by Cristobal et al.(2007)
0.40 by Loiacono et al. (2002)
0.50 by Francis and White(2002) and Kim and Stoel (2004)
Wolfinberger and Gilly (2003) are rigorous in retaining only items that
load at 0.50 or more on a factor
do not load at more than 0.50 on two factors
have an item total correlation of more than 0.40
75
Reliability and Validity
A measure is reliable to the extent that independent but comparable measures of the
same trait or construct of a given object agree. Reliability depends on how much of
the variation in scores is attributable to random or chance errors. If a measure is
perfectly reliable, XR = 0
A measure is valid if when the differences in observed scores reflect true differences
on the characteristic one is attempting to measure and nothing else, that is, XO= XT.
a) Reliability of individual items:

o loadings greater than 0.50 on the respective construct (Hulland, 1999; White
et al., 2003; Ribbink et al. 2004)
o exhibit loadings with the intended construct of .70 or more, and are
statistically significant (Ledden, 2007)
76
b) Reliability of a construct or Internal consistency of the scale
It allows to check the internal consistency of all indicators to measure the concept
(thoroughness with which all indicators measure the same)
Internal homogeneity of a set of items:
o Composed Reliability (CR) greater than 0.70 (Anderson and Gerbing, 1988;
Bagozzi and Yi, 1988)
o Correlation between each item and its construct >0.5 and correlations among
items from the same construct >0.3.
Cronbachs greater than 0.70 (Fornell and Larcker, 1981; Nunally and
Bernstein, 1994)
77
c) Convergent validity
the extent to which a set of items assumed to represent a construct does in fact
converge on the same construct:
o Average variance extracted (AVE - amount of variance that a construct obtains
from the indicators in relation to the amount of variance of the measurement
error) greater than 0.50 (Fornell and Larcker, 1981; Chin and Newsted, 1999;
Gounaris and Dimitriadis, 2003)
o Factor loadings greater than 0.5 (Grewal et al., 1998).
(d) Discriminant validity (when there are some scales in the model)
the extent to which measures of theoretically unrelated constructs do not correlate
with one another:
inter-factor correlations are less than the square root of the average variance
extracted (AVE) (Fornell and Larcker, 1981)
78
Numerical example:
Reliability of a construct
Reliability (SATCNTRY)= =0.831
Reliability (PolTrust)= =0.811
79
Convergent validity
1st) Factor loadings are significant and greater than 0.5
2nd) Average Variance Extracted (AVE) for each of the factors > 0.5.
Squared Multiple Correlations: AVE
Estimate
stfgov 0,789 (0.789+0.635)/2= 0.712 SATCNTRY
stfdem 0,635
trstlgl 0,629 (0.629+0.735)/2= 0.682 PolTrust
trstprl 0,735
80
Discriminant Validity of construct
Average variance extracted (AVE). For this a construct must have more variance with its
indicators than with other constructs of the model. It is when
between each pair of factors > estimated correlation between those factors
StfCntry PolTrust
StfCntry 0.843
PolTrust 0.862 0.823
sqrt(0.712)=0.832 sqrt(0.682)=0.823
Correlations:
Estimate
SATCNTRY <--> PolTrust ,862
81
Exploratory Factor Analysis (EFA)
82
Comparison with exploratory factor analysis (EFA)
f1 0 1 0 0 0 0 0

f2 0 0 1 0 0 0 0
v1=11f1+12f2+e1 e 0 0 0 11 0 0 0
v2=21f1+22f2+e2 1 N ,
e2 0 0 0 0 22 0 0
v3=31f1+32f2+e3 e
3 0 0 0 0 0 33 0
v4=41f1+42f2+e4 e 0 0 0 0 0 0 44
4
In EFA, which items measure which dimensions is the outcome, in CFA it is the input.
In EFA questionnaire items aim globally at a broad concept, in CFA each questionnaire
item is designed to tackle a specific dimension of the concept.
83
Random and systematic error. Reliability and validity assessment with
CFA
Reliability: Extent to which a measurement procedure would yield the same result
upon several independent trials under identical conditions. In other words, low random
measurement error (any systematic error would replicate). Random measurement error is
a problem for OLS regression but not for SEM with multiple indicators, because it is
accounted for by the parameters.
Validity: Extent to which a measurement procedure measures what it is intended to
measure and only what it is intended to measure, except for random measurement error.
In other words, absence of systematic error.
Assuming the validity of v, its reliability is the percentage of variance explained by f.
Always follow this golden rule:
Estimate reliability after validity has been diagnosed.
Test the specification of measurement equations in a CFA model prior to specifying
equations relating factors. Otherwise, relationships among factors might be biased
(specification errors) or even meaningless (invalidity).
84
Construct validation: Estimate a CFA model that assumes validity...
All items load on the factor they are supposed to measure (a second loading is a sign
of measuring another factor which is in the model).
No error correlations are specified (error correlations contain common unknown
variance, a sign of measuring an unknown factor which is not in the model).
....and diagnose its goodness of fit. You can never be certain of validity, but a CFA model
can help detect signs of invalidity such as:
It does not correctly reproduce the covariance matrix (additional loadings or error
correlations are needed, thus revealing mixed items, additional necessary dimensions).
Convergent invalidity.
Some variables have too low to be attributed to solely random error (convergent
invalidity).
Some factors have correlations very close to unity (discriminant invalidity).
Some factors have correlations of unexpected signs or magnitudes (nomological
invalidity).
85
Modelling stages in SEM
Verbal theories
1) SPECIFICATION
Model: equations and assumptions

2) IDENTIFICATION
Estimable model
3) DATA COLLECTION
Exploratory data analysis. Computation of S
MODIFICATION
4) ESTIMATION
Methods to fit (p) to S
5) FIT DIAGNOSTICS
Discrepancies between (p) and S
NO
ADEQUATE?
YES
6) UTILIZATION
86
- Theory validation, validity and reliability assessment ...
Theoretical and statistical grounds
1) Specification
Formal establishment of a statistical model: set of statistical and substantive
assumptions that structure the data according to a theory.
Equations: one or two of the following systems of equations:
Relating factors or error free variables to one another (structural equations).
Relating factors to indicators with error (measurement equations).
Parameters: two types:
o Free (unknown and freely estimated).
o Fixed (known and constrained to a given value, usually 0 or 1).
The amount of the researchers prior knowledge will affect the modelling strategy:
If this knowledge is exhaustive and detailed, it will be easily translated into a model
specification. The researchers aim will simply be to use the data to estimate and
confirm or reject the model (confirmatory strategy).
If this knowledge is less exhaustive and detailed, the fixed or free character of a
number of parameters will be dubious. This will lead to a model modification process
by repeatedly going through the modelling stages (exploratory strategy).
87
Full SEM model:
88
Equations:
StfCntry = 31 * SocialTrust + d3
PolTrust = 21 * SocialTrust + d2
ppltrst = 1* SocialTrust + e1
pplfair = 21 * SocialTrust + e2
pplhlp = 31 * SocialTrust + e3 Total number of
parameters = 20
trstplt = 52 * Political Trust + e5
trstlgl =62 * Political Trust + e6
stfeco=73 * StfCntry + e7
stfgov=83 * StfCntry + e8
stfdem=1 * StfCntry + e9

e1 = 11 ; e2 = 22; e3 = 33; e4=44; e5 = 55; e6 = 66; e7 = 77; ; e8 = 88; ; e9 = 99;
SocialTrust = 11; d2 = 22; d3 = 33
89
2) Identification
Can model parameters be derived from variances and covariances?
Identification must be studied prior to data collection
If a model is not identified:
o Seek more restrictive specifications with additional constraints (if theoretically
justifiable).
o Add more indicators or more exogenous factors.
Identification conditions
o Underidentification (df<0): infinite number of solutions that make S equal (p).
o Possibly identified (df=0): there may be a unique solution that makes S equal
(p). This type of models is less interesting in that their restrictions are not
testable.
o Possibly overidentified (df>0): there may be a unique solution that minimizes
discrepancies between S and (p). Only these models, more precisely their
restrictions, can be tested from the data.
90
Example Full SEM model:
9 observed variables lead to (910/2)=45 variances and covariances: possibly
overidentified model.
Total number of parameters = 20
Degrees of Freedom = 45-20 = 25 > 0
The model fulfils enough sufficient conditions:

1) Equations relating factors are recursive
2) Disturbances are uncorrelated
3) All factors have at least two pure indicators.
91
3) Data collection and exploratory analyses
Valid sampling methods
In their standard form, SEM assumes simple random sampling. Extensions to stratified
and cluster samples have been recently developed. In any case, they must be random
samples.
Sample size
Sample sizes in the 200-500 range are usually enough. Sample requirements increase:
For smaller R2 and percentages of explained variance.
When collinearity is greater.
For smaller numbers of indicators per factor (especially less than 3).
Under non normality, the required sample size is larger (in the 400-800 range).
Outlier and non-linearity detection
As before doing any other type of statistical modeling, outliers and non-linear
relationships must be detected by means of exploratory data analysis.
92
4) Estimation
First estimate the sample variances and covariances (S) and then find the best fitting
p parameter values.
A fit function related to the size of the residuals in S-(p) is minimized.
Each choice of fit function results in an alternative estimation method. One of
these choices leads to the maximum likelihood estimator (ML) which is the most
often used.
Estimation assumes that a covariance matrix is analyzed. Estimations obtained
from a correlation matrix are only correct only under very specific conditions.
93
Normality assumed: ML and GLS
Normality not assumed: ULS, Scale LS, AD
The two most commonly used estimation

techniques are Maximum likelihood (ML) and
normal theory generalized least square (GLS).
ML and GLS: large sample size, continuous data,
and assumption of multivariate normality
Unweighted least squares (ULS): scale dependent.
Asymptotically distribution free (ADF) (Weighted
least squares, WLS): serious departure from
normality.
94
Examining the ordered correlation matrix
Let us look at correlations as well and spot low correlations of items measuring the
same or large correlations between items measuring different dimensions:
stfeco stfgov stfdem ppltrst pplfair pplhlp trstprl trstplt trstlgl

stfeco 1,000
stfgov ,718 1,000
stfdem ,653 ,708 1,000
ppltrst ,281 ,276 ,276 1,000
pplfair ,285 ,273 ,275 ,543 1,000
pplhlp ,298 ,277 ,262 ,409 ,442 1,000
trstprl ,554 ,669 ,564 ,305 ,289 ,247 1,000
trstplt ,515 ,642 ,530 ,280 ,258 ,217 ,719 1,000
trstlgl ,526 ,588 ,583 ,308 ,288 ,241 ,680 ,600 1,000
95
Amos output

trstlgl Unobserved, exogenous variables
trstplt e6
trstprl e5
pplhlp e4
pplfair SocialTrust
ppltrst e3
stfdem e2
stfgov e1
stfeco e9
e8
Unobserved, endogenous variables e7
PolTrust d3
StfCntry d2
96



Chi-square = 1097,457
97

PolTrust <--- SocialTrust 1,985 ,101 19,721 *** par_7
StfCntry <--- SocialTrust 1,670 ,087 19,200 *** par_8
trstlgl <--- PolTrust ,897 ,021 43,529 *** par_1
trstplt <--- PolTrust ,845 ,018 46,008 *** par_2
trstprl <--- PolTrust 1,000
pplhlp <--- SocialTrust ,911 ,064 14,299 *** par_3
pplfair <--- SocialTrust ,984 ,065 15,135 *** par_4
ppltrst <--- SocialTrust 1,000
stfdem <--- StfCntry 1,000
stfgov <--- StfCntry 1,155 ,024 47,207 *** par_5
stfeco <--- StfCntry ,975 ,023 42,054 *** par_6
98
Standardized Regression Weights: Variances:
Estimate Estimate S.E. C.R. P Label

PolTrust <--- SocialTrust ,904 SocialTrust ,960 ,093 10,320 *** par_9
StfCntry <--- SocialTrust ,899 d3 ,636 ,076 8,417 *** par_10
trstlgl <--- PolTrust ,775 d2 ,841 ,106 7,927 *** par_11
trstplt <--- PolTrust ,806 e6 2,478 ,090 27,622 *** par_12
trstprl <--- PolTrust ,877 e5 1,778 ,068 25,983 *** par_13
pplhlp <--- SocialTrust ,396
e4 1,393 ,071 19,718 *** par_14
pplfair <--- SocialTrust ,432
e3 4,296 ,130 32,982 *** par_15
ppltrst <--- SocialTrust ,440
e2 4,050 ,124 32,705 *** par_16
stfdem <--- StfCntry ,800
stfgov <--- StfCntry ,894 e1 3,996 ,122 32,639 *** par_17
stfeco <--- StfCntry ,803 e9 1,868 ,069 26,999 *** par_18
e8 1,115 ,060 18,450 *** par_19
e7 1,737 ,065 26,838 *** par_20
99
Squared Multiple Correlations:
Estimate
StfCntry ,808
PolTrust ,818
stfeco ,644
stfgov ,799
stfdem ,640
ppltrst ,194
pplfair ,187
pplhlp ,156
trstprl ,769
trstplt ,650
trstlgl ,601
100
SocialTrust StfCntry PolTrust stfeco stfgov stfdem ppltrst pplfair pplhlp trstprl trstplt trstlgl
SocialTrust 1,000
StfCntry ,899 1,000
PolTrust ,904 ,813 1,000
stfeco ,722 ,803 ,653 1,000
stfgov ,803 ,894 ,727 ,717 1,000
stfdem ,719 ,800 ,650 ,642 ,715 1,000
ppltrst ,440 ,396 ,398 ,318 ,354 ,316 1,000
pplfair ,432 ,389 ,391 ,312 ,347 ,311 ,190 1,000
pplhlp ,396 ,356 ,358 ,285 ,318 ,284 ,174 ,171 1,000
trstprl ,793 ,713 ,877 ,572 ,637 ,570 ,349 ,343 ,314 1,000
trstplt ,729 ,655 ,806 ,526 ,586 ,524 ,321 ,315 ,288 ,707 1,000
trstlgl ,701 ,630 ,775 ,506 ,563 ,504 ,309 ,303 ,277 ,679 ,625 1,000
101
5) Fit diagnostics
Interpretation does not proceed until the goodness of fit has been assessed.
The fit diagnostics attempt to determine if the model is correct and useful.
o Correct model: its restrictions are true in the population. Relationships are
correctly specified without the omission of relevant parameters.
o In a correct model, the differences between S and (p) are small and random.
o Correctness must not be strictly understood. A model must be an approximation
of reality, not an exact copy of it.
o Thus, a good model will be a compromise between parsimony and
approximation.
Diagnostics will usually do well at distinguishing really badly fitting models from fairly
well fitting models. Many models will fit fairly well (even exactly equally well if
equivalent) and will be hard to distinguish statistically, they can be only distinguished
theoretically.
102
The 2 goodness of fit statistic
Null hypothesis: the model is correct, without omitted relevant parameters:
H0:=()
2 goodness of fit statistic follows a 2 distribution with g degrees of freedom.

Rejection implies concluding that some relevant parameters have been omitted.
Sample size and power of the test are often high. Researchers are usually willing to
accept approximately correct models with small misspecifications, which are rejected
due to the high power. Quantifying the degree of misfit is more useful than testing the
hypothesis of exact fit.
103
5.1 Global diagnostics
First look for serious problems (common for small samples, very badly fitting
models, and models with two indicators per factor):
o Lack of convergence of the estimation algorithm.
o Underidentification.

Degrees of freedom (10 - 12): -2
The model is probably unidentified. In order to achieve identifiability, it will probably be necessary to impose 1 additional
constraint.
If Inadmissible estimates (e.g. negative variances, correlations larger than 1...).
104
Fix negative unsignificant variances to zero:
Revise the model if there are significant negative variances.

Merge in one pairs of factors with correlations larger than 1 or not significantly lower
than 1.
105
The Tucker and Lewis (1973) index (TLI) and Bentlers (1990) comparative fit index
(CFI) introduce the degrees of freedom of the base (gb) and researcher (g) models to
account for parsimony. They will increase after adding parameters only if the 2
statistic decreases more substantially than g.
b2 2
gb

g ( b2 g b ) ( 2 g )
TLI CFI min ;1
b2 1 b gb
2

gb
Root mean squared error of approximation (RMSEA) (Steiger, 1990):
RMSEA

max 2 g ; 0
Ng
Values below 0.05 are considered acceptable.
The sampling distribution is known, which makes it possible to do confidence
intervals and test the hypothesis of approximate fit. If both extremes of the interval
are larger than 0.05, a very bad fit can be concluded. If both extremes are below 0.05,
a very good fit can be concluded.
106
5.2. Detailed diagnostics
Are standardized estimated values reasonable and of the expected sign?
Are there significant residuals that suggest the addition of parameters? (To
estimate them in Amos Analysis properties \ Output \ Residual moments). The values
are t-values.
Each residual covariance, has been divided by an estimate of its standard error. In sufficiently large
samples, these standardized residual covariances have a standard normal distribution if the model is
correct. So, if the model is correct, most of them should be less than two in absolute value.
Are there low R2 values suggesting the omission of explanatory variables or low
values suggesting a lack of validity? (To estimate them in Amos Analysis properties \
Output \ Squared Multiple Correlations)
107
The modification index is an individual significance test of omitted parameters (Ho:
the omitted parameter is zero in the population).
o Reject hypothesis above critical 2 value with 1 df. 3.84 for type I risk 5%.
o Always consider the expected standardized estimated parameter and its sign: if
power is high, parameters of a substantially insignificant value can be statistically
significant. Only add parameters of a substantial size.
Residuals and modification indices can suggest the addition of parameters in order
to improve fit. A model can also be improved by dropping irrelevant parameters
(parsimony principle). The usual t statistic tests the significance of included
parameters (Ho: the included parameter is zero in the population).
o Non-significant disturbance covariances and measurement error covariances
should be dropped from the model. Non-significant parameters may be dropped
from the model if their theoretical argumentation is weak. Non -significant
parameters reveal invalidity.
108
stfeco stfgov stfdem ppltrst pplfair pplhlp trstprl trstplt trstlgl

stfeco ,000
stfgov ,037 ,000
stfdem ,442 -,255 ,000
ppltrst -1,694 -3,550 -1,881 ,000
pplfair -1,257 -3,363 -1,656 16,729 ,000
pplhlp ,569 -1,851 -1,039 11,168 12,884 ,000
trstprl -,739 1,307 -,236 -1,988 -2,438 -3,059 ,000
trstplt -,483 2,350 ,262 -1,888 -2,613 -3,316 ,487 ,000
trstlgl ,888 1,053 3,394 -,047 -,685 -1,682 ,010 -1,004 ,000
109
Modification Indices
M.I. Par Change

d2 <--> d3 37,757 ,224
e7 <--> d2 10,661 -,152
e8 <--> d2 59,293 ,325
e1 <--> d3 68,710 -,454
e1 <--> d2 21,676 -,304
e1 <--> e8 44,484 -,380
e2 <--> d3 55,293 -,410
e2 <--> d2 40,987 -,421
e2 <--> e8 42,665 -,374
e2 <--> e1 474,885 1,882
e3 <--> d3 9,736 -,177
e3 <--> d2 71,245 -,570
e3 <--> e7 7,221 ,170
110
Model Fit Summary
RMR, GFI
Model RMR GFI AGFI PGFI

Default model ,417 ,894 ,809 ,496
Saturated model ,000 1,000
Independence model 2,251 ,366 ,207 ,293
Baseline Comparisons
NFI RFI IFI TLI

Model CFI
Delta1 rho1 Delta2 rho2
Default model ,897 ,852 ,899 ,854 ,899
Saturated model 1,000 1,000 1,000
Independence model ,000 ,000 ,000 ,000 ,000
RMSEA
Model RMSEA LO 90 HI 90 PCLOSE

Default model ,136 ,129 ,143 ,000
Independence model ,356 ,350 ,361 ,000
111
5.3. Model modification. Capitalization on chance
Frequently models fail to pass the diagnostics.
Which modifications introduce and in which order?
o Introduce modifications one at a time, and carefully examine results before
introducing the next. One modification can modify the need for another.
o First improve fit (add parameters). Then improve parsimony (drop parameters).
o Disregard high modification indices with very small expected estimates.
o Consider models with good descriptive fit indices, even if the 2 test rejects them
(parsimony-approximation compromise).
o Avoid adding theoretically uninterpretable parameters, no matter how significant.
o Make few modifications.
o The selected model must pass the diagnostics, theoretically relevant and useful.
o Modified models can be compared with CFI and RMSEA.
112
Model modification has some undesirable statistical consequences, especially if
modifications are blindly done using only statistics, that is, without theory.
Even if model modification has been done carefully, modifications are based on a
particular sample. Have we reached a model that fits the population?
Bias of estimates and significance tests: only large and significant parameters have
been considered to be candidates for addition.
The introduction of modifications that improve the fit to the sample but not to the
population is known as capitalization on chance.
The only solution is to check that the model fits well beyond the particular sample
used:
o Crossvalidation: estimation and goodness of fit test of the model on an
independent sample of the same population. If only one sample is available, it
can be split: the first half is used for model modification and the second for
validation. Crossvalidation is successful if the model fits the second sample
reasonably well.
113
Complete Example Modeling Stages:
CFA model
114
trstplt = 52 * Political Trust + e5
trstlgl =62 * Political Trust + e6
stfeco=73 * StfCntry + e7
stfgov=83 * StfCntry + e8
stfdem=1 * StfCntry + e9

e4=44; e5 = 55; e6 = 66; e7 = 77;
; e8 = 88; ; e9 = 99;
Cov(StfCntry,Poltrust)= 23;
Var(StfCntry) = 33;
Var(polTrust) = 22;
115
The model has 6x7/2=21 variances and covariances, and 13 parameters (6 error
variances , 2 factor variances, 1 factor covariance, 4 loadings): 8 degrees of freedom.
Each factor has at least 2 pure indicators: the measurement part is identified.
In the complete model with parameters, the factors are related in a recursive system
without error covariances: it is identified.
This model only has measurement equations. The loading of the first variable in each
factor is equal to 1. The remaining loadings are free. Each observed variable also has
an error variance . By default all factor variances and covariances are free. To
constrain factors to be uncorrelated one would add the constrained parameter with a
value of 0 in the Covariance:
116
Estimation
Observed, endogenous variables Unobserved, exogenous variables

trstlgl PolTrust
trstplt e6
trstprl e5
stfdem e4
stfgov StfCntry
stfeco e9
e8
e7

117

Fixed 8 0 0 0 0 8
Labeled 0 0 0 0 0 0
Total 12 1 8 0 0 21
stfeco stfgov stfdem trstprl trstplt trstlgl

stfeco 4,885
stfgov 3,734 5,534
stfdem 3,284 3,792 5,182
trstprl 3,007 3,861 3,152 6,020
trstplt 2,565 3,405 2,721 3,977 5,080
trstlgl 2,898 3,446 3,304 4,153 3,370 6,204
118

stfeco 1,000
stfgov ,718 1,000
stfdem ,653 ,708 1,000
trstprl ,554 ,669 ,564 1,000
trstplt ,515 ,642 ,530 ,719 1,000
trstlgl ,526 ,588 ,583 ,680 ,600 1,000


Chi-square = 121,665
119
Estimates (Group number 1 - Default model). Maximum Likelihood Estimates

trstlgl <--- PolTrust ,893 ,021 43,329 *** par_1 ### Significant###
trstplt <--- PolTrust ,848 ,018 46,413 *** par_2 ### Significant###
stfgov <--- StfCntry 1,174 ,025 47,430 *** par_3 ### Significant###
stfeco <--- StfCntry ,970 ,023 41,401 *** par_4 ### Significant###
Estimate We can compute R2

trstlgl <--- PolTrust ,771 ,7712= 0,594
trstplt <--- PolTrust ,810 ,8102= 0,656
trstprl <--- PolTrust ,877 ### Large values ### ,8772= 0,769
stfdem <--- StfCntry ,795 ,7952= 0,632
stfgov <--- StfCntry ,903 ,9032= 0,815
stfeco <--- StfCntry ,795 ,7952= 0,632
120
Squared Multiple Correlations: (Group number 1 - Default model). This is R2
Estimate
stfeco ,631 = ,7952
stfgov ,816 = ,9032
stfdem ,632 = ,7952
trstprl ,769 = ,8772
trstplt ,656 = ,8102
trstlgl ,595 = ,7712

PolTrust <--> StfCntry 3,281 ,129 25,485 *** par_5 ### Significant ###
Estimate
121
Estimate
PolTrust <--> StfCntry ,843 ### Smaller than 1 ###

PolTrust 4,627 ,181 25,579 *** par_6
StfCntry 3,275 ,147 22,256 *** par_7
e6 2,513 ,090 27,853 *** par_8
e5 1,750 ,068 25,858 *** par_9 ###Positive ###
e4 1,393 ,070 19,843 *** par_10
e9 1,907 ,070 27,400 *** par_11
e8 1,020 ,059 17,269 *** par_12
e7 1,800 ,066 27,421 *** par_13
StfCntry PolTrust stfeco stfgov stfdem trstprl trstplt trstlgl

StfCntry 1,000
PolTrust ,843 1,000
122
stfeco ,795 ,670 1,000
stfgov ,903 ,761 ,718 1,000
stfdem ,795 ,670 ,632 ,718 1,000
trstprl ,739 ,877 ,587 ,667 ,587 1,000
trstplt ,682 ,810 ,542 ,616 ,542 ,710 1,000
trstlgl ,650 ,771 ,516 ,587 ,517 ,676 ,624 1,000
123
FIT DIAGNOSTICS
Model Fit Summary
CMIN
Model NPAR CMIN DF P CMIN/DF

Default model 13 121,665 8 ,000 15,208
Saturated model 21 ,000 0
Independence model 6 8726,410 15 ,000 581,761
RMR, GFI

Default model ,113 ,982 ,953 ,374
NFI RFI IFI TLI

Model CFI
Default model ,986 ,974 ,987 ,976 ,987
124
RMSEA

Default model ,078 ,066 ,091 ,000
AIC
Model AIC BCC BIC CAIC

Default model 147,665 147,743 222,462 235,462
Saturated model 42,000 42,127 162,826 183,826
Independence model 8738,410 8738,446 8772,932 8778,932
125
Matrices (Group number 1 - Default model
Residual Covariances (Group number 1 - Default model)

stfeco ,000
stfgov ,003 ,000
stfdem ,106 -,053 ,000
trstprl -,177 ,010 -,128 ,000
trstplt -,136 ,138 -,062 ,052 ,000
trstlgl ,055 ,007 ,374 ,021 -,136 ,000

stfeco ,000
stfgov ,023 ,000
stfdem ,860 -,387 ,000 ### Values < 2 ###
trstprl -1,356 ,072 -,956 ,000
trstplt -1,161 1,066 -,515 ,367 ,000
trstlgl ,426 ,047 2,827 ,140 -,990 ,000
Modification Indices (Group number 1 - Default model)

126
M.I. Par Change If you repeat the analysis treating the covariance
e7 <--> StfCntry 8,471 ,111
between e7 and StfCntry as a free parameter, its
e7 <--> PolTrust 12,506 -,164
e8 <--> PolTrust 5,674 ,096 estimate will become larger by approximately 0,111
e9 <--> e7 10,516 ,146 than it is in the present analysis.
e9 <--> e8 6,049 -,096 PROBLEM:
e4 <--> StfCntry 8,131 -,106
e4 <--> PolTrust 5,407 ,099 Change in covariances, the change in correlations is
e4 <--> e7 5,200 -,101 not known!
e4 <--> e9 12,892 -,164
e5 <--> e7 7,757 -,125
e5 <--> e8 22,745 ,193
e5 <--> e9 6,924 -,122
e5 <--> e4 4,317 ,088
e6 <--> StfCntry 8,911 ,135
e6 <--> PolTrust 6,250 -,134
e6 <--> e8 19,909 -,211
e6 <--> e9 62,688 ,427
e6 <--> e5 13,862 -,193
127
M.I. Par Change
stfeco <--- trstprl 4,641 -,027
stfeco <--- trstplt 6,938 -,036
stfgov <--- trstplt 11,171 ,041
stfdem <--- trstlgl 22,102 ,059
trstprl <--- stfeco 5,041 -,031
trstprl <--- stfdem 8,843 -,040
trstplt <--- trstlgl 4,962 -,027
trstlgl <--- stfdem 28,944 ,085
128
Model modification:
129
Observed, endogenous variables Unobserved, exogenous variables

trstlgl PolTrust
trstplt e6
trstprl e5
stfdem e4
stfgov StfCntry
stfeco e9
e8
e7

130

stfeco 4,885
stfgov 3,734 5,534
stfdem 3,284 3,792 5,182
trstprl 3,007 3,861 3,152 6,020
trstplt 2,565 3,405 2,721 3,977 5,080
trstlgl 2,898 3,446 3,304 4,153 3,370 6,204

stfeco 1,000
stfgov ,718 1,000
stfdem ,653 ,708 1,000
trstprl ,554 ,669 ,564 1,000
trstplt ,515 ,642 ,530 ,719 1,000
trstlgl ,526 ,588 ,583 ,680 ,600 1,000
131


Chi-square = 56,072
132

trstlgl <--- PolTrust ,882 ,020 43,145 *** par_1
trstplt <--- PolTrust ,846 ,018 46,790 *** par_2
stfgov <--- StfCntry 1,196 ,026 46,064 *** par_3
stfeco <--- StfCntry ,979 ,024 41,131 *** par_4
Estimate
trstlgl <--- PolTrust ,765
trstplt <--- PolTrust ,812
trstprl <--- PolTrust ,881
stfdem <--- StfCntry ,787
stfgov <--- StfCntry ,910
stfeco <--- StfCntry ,793
133

PolTrust <--> StfCntry 3,229 ,128 25,315 *** par_5
e6 <--> e9 ,448 ,057 7,815 *** par_6
Estimate
PolTrust <--> StfCntry ,835
e6 <--> e9 ,199

PolTrust 4,674 ,182 25,727 *** par_7
StfCntry 3,202 ,146 21,860 *** par_8
e6 2,581 ,092 28,080 *** par_9
e5 1,732 ,068 25,635 *** par_10
e4 1,346 ,071 19,050 *** par_11
e9 1,969 ,072 27,270 *** par_12
e8 ,955 ,060 15,814 *** par_13
e7 1,816 ,067 27,214 *** par_14
134
Estimate
stfeco ,628
stfgov ,827
stfdem ,619
trstprl ,776
trstplt ,659
trstlgl ,585
Matrices (Group number 1 - Default model)
Implied (for all variables) Covariances (Group number 1 - Default model)

StfCntry 3,202
PolTrust 3,229 4,674
stfeco 3,134 3,161 4,885
stfgov 3,828 3,862 3,748 5,534
stfdem 3,202 3,229 3,134 3,828 5,171
trstprl 3,229 4,674 3,161 3,862 3,229 6,020
trstplt 2,733 3,956 2,676 3,268 2,733 3,956 5,080
trstlgl 2,847 4,121 2,787 3,405 3,295 4,121 3,488 6,214
135

StfCntry 1,000
PolTrust ,835 1,000
stfeco ,793 ,662 1,000
stfgov ,910 ,759 ,721 1,000
stfdem ,787 ,657 ,624 ,716 1,000
trstprl ,736 ,881 ,583 ,669 ,579 1,000
trstplt ,678 ,812 ,537 ,616 ,533 ,715 1,000
trstlgl ,638 ,765 ,506 ,581 ,581 ,674 ,621 1,000
Implied Covariances (Group number 1 - Default model)

stfeco 4,885
stfgov 3,748 5,534
stfdem 3,134 3,828 5,171
trstprl 3,161 3,862 3,229 6,020
trstplt 2,676 3,268 2,733 3,956 5,080
trstlgl 2,787 3,405 3,295 4,121 3,488 6,214
136
Implied Correlations (Group number 1 - Default model)

stfeco 1,000
stfgov ,721 1,000
stfdem ,624 ,716 1,000
trstprl ,583 ,669 ,579 1,000
trstplt ,537 ,616 ,533 ,715 1,000
trstlgl ,506 ,581 ,581 ,674 ,621 1,000

stfeco ,000
stfgov -,105 ,000
stfdem 1,223 -,268 ,072
trstprl -1,188 -,001 -,576 ,000
trstplt -,949 1,056 -,102 ,148 ,000
trstlgl ,865 ,293 ,060 ,215 -,861 -,058
137
Modification Indices (Group number 1 - Default model)
M.I. Par Change

e7 <--> StfCntry 5,054 ,085
e7 <--> PolTrust 7,595 -,129
e8 <--> StfCntry 4,415 -,066
e8 <--> PolTrust 7,139 ,107
e9 <--> e7 13,443 ,164
e4 <--> StfCntry 5,354 -,085
e4 <--> e7 5,321 -,102
e5 <--> e7 8,455 -,131
e5 <--> e8 13,574 ,148
e6 <--> e7 7,879 ,146
e6 <--> e5 8,934 -,153
M.I. Par Change
138
M.I. Par Change

stfeco <--- stfdem 6,196 ,033
stfeco <--- trstplt 6,111 -,033
stfgov <--- trstplt 8,429 ,035
stfdem <--- stfeco 4,332 ,029
trstprl <--- stfeco 4,489 -,029
trstlgl <--- stfeco 4,035 ,032
Model Fit Summary
CMIN
Model NPAR CMIN DF P CMIN/DF

Default model 14 56,072 7 ,000 8,010
Saturated model 21 ,000 0
Independence model 6 8726,410 15 ,000 581,761
139
RMR, GFI

Default model ,074 ,992 ,976 ,331
NFI RFI IFI TLI

Model CFI
Default model ,994 ,986 ,994 ,988 ,994
RMSEA

Default model ,055 ,042 ,069 ,252
140
References
Anderson, J.C., & Gerbing, D.W. (1988), Structural equation modeling in practice: A review and recommended two-step approach. Psychological
Bulletin, 103(3), 411423.
Bagozzi, R.P., & Yi, Y. (1988). On the evaluation of structural equation models. Journal of Academy of Marketing Science, 6(1), 7494.
Chin, W.W. and Newsted, P.R. (1999), Structural equation modeling analysis with small samples using partial least squares, in Hoyle, R.R. (Ed.),
Statistical Strategies for Small Sample Research, Sage Publications, Thousand Oaks, CA, pp. 307-41.
Churchill, G. (1979), A Paradigm for Developing Better Measures of Marketing Constructs, Journal of Marketing Research, Vol. 16 (February
1979), pp. 64-73
Fornell, C. and Larcker, D.F. (1981), Evaluating structural equation models with unobservable variables and measurement error, Journal of
Marketing Research, Vol. 18, No. 1, pp. 39-50.
Gounaris, S., Dimitriadis, S., 2003. Assessing service quality on the web: evidence from business-to-consumer portals. Journal of Services
Marketing 17 (4/5), 529548.
Grewal D, Monreo KB, Krishnan R. The effects of price-comparison advertising on buyers perceptions of acquisition value, transaction value,
and behavioral intentions. J Market 1998;62(2):4659.
Hair, J.F., Jr, Anderson, R.E., Tatham, R.L., & Black, W.C. (1999). Multivariate data analysis. London: Prentice Hall.
Hulland, J. (1999), Use of partial least squares (PLS) in strategic management research: a review of four recent studies, Strategic Management
Journal, Vol. 20, No. 2, pp. 195-204.
Ladhari, R. (2010): Developing e-service quality scales: a literature review, Journal of Retailing and Consumer Services, Vol 17. pp. 464-477.
Ledden, L., Kalafatis, S., Samouel, P. (2007), The relationship between personal values and perceived value of education, Journal of Business
Research 60 (2007) 965974
Nunnally, J.C. and Bernstein, I.H. (1994), Psychometric Theory, 3rd ed., McGraw-Hill, New York, NY.
Ribbink, D., van Riel, A., Liljander, V. and Streukens, S. (2004), Comfort your online customer: quality, trust and loyalty on the internet,
Managing Service Quality, Vol. 14, No. 6, pp. 446-456
White, J.C., Varadarajan, P.R. and Dacin, P.A. (2003), Market situation interpretation and response: the role of cognitive style, organizational
culture, and information use, Journal of Marketing, Vol. 67, No. 3, pp. 63-79.
Wolfinbarger, M., Gilly, M.C., 2003. ETailQ: dimensionalizing, measuring and predicting retail quality. Journal of Retailing 79 (3), 183198.
141
Amos Graphics
142
143
144
145
146
147
148

Ess Sem Koolitus 02-10-14

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ess Sem Koolitus 02-10-14

Uploaded by

Copyright:

Available Formats

Introduction to

Structural Equation Modeling

Dr. Llus Coromina (University of Girona, Spain)

2nd and 3rd October 2014

To introduce models that relate variables measured with error.

SEM make it possible to:

Macroeconometric models (1940-50): dependence analysis of non-experimental

Path analysis (1920-70): analysis of correlations. Otherwise similar to econometric

Factor analysis (1900-1970): analysis of correlations among multiple indicators of the

Latent variables (theoretical concepts that cannot be observed directly) =

Observed variables (indicators of the underlying construct which they are

Exogenous (Independent) vs Endogenous (dependent) latent variables.

Specification: Model based on researchers knowledge of the related theory

Unobserved latent factors

Measurement error associated with an observed variable

Residual error (disturbance) in the prediction of an unobserved factor

Path coefficient for regression of one factor onto another factor.

Path coefficient for regression of an observed variable onto an

Spurious relationship: both have a common cause

Joint effect. The difference between Spurious and Indirect. If is that in

Factor Analysis: analisis of covariances among observed variables in order to get

Allows specification of regression structure among latent variables.

Full model = measurement model + Structural Model

Recursive Full model: direction of cause from one direction only

* Term ei is measurement error (Random measurement error and Systematic error or

Equations exhaustively describe the joint distribution of v1 and v2 as a function of 3

Determination coefficient R2=1-( 22/22)

We can estimate from a sample

A goodness of fit measure.

In a dependence analysis, a residual = v2 21v1 .

The model is recursive.

Variable Summary (Group number 1)

Your model contains the following variables (Group number 1)

Observed, endogenous variables

Variable counts (Group number 1)

Number of variables in your model: 3

Parameter Summary (Group number 1)

Weights Covariances Variances Means Intercepts Total

Sample Moments (Group number 1)

Sample Covariances (Group number 1)

Sample Correlations (Group number 1)

Computation of degrees of freedom (Default model)

Number of distinct sample moments: 3

Result (Default model)

Minimum was achieved

Scalar Estimates (Group number 1 - Default model)

Standardized Regression Weights: (Group number 1 - Default model)

Variances: (Group number 1 - Default model)

Standardized Residual Covariances (Group number 1 - Default model)

Model R R squared Adjusted R Squared

v1 0 11 0 0 is 33 and contains 43/2=6 non-

EXERCISE: Derive Equation using path analysis.

Notes for Group (Group number 1)

The model is recursive.

Variable Summary (Group number 1)

Your model contains the following variables (Group number 1)

Observed, endogenous variables

Your model contains the following variables (Group number 1)

Observed, endogenous variables

Variable counts (Group number 1)

Number of variables in your model: 5

Weights Covariances Variances Means Intercepts Total

Sample Covariances (Group number 1)