Professional Documents
Culture Documents
1
Introduction
2
History
In their most recent and advanced versions, SEM enable researchers to:
o Analyze non-normal data.
o Treat missing values by maximum likelihood.
o Treat complex sample data.
3
History of models for the study of causality
Analysis of variance (1920-1930): decomposition of the variance of a dependent
variable in order to identify the part contributed by an explanatory variable. Control
of third variables (experimental design).
SEM (1970): Econometric models, path analysis and factor analysis are joined
together. Relationships among variables measured with error, on non-experimental
data from an interdependence analysis perspective.
4
History of models for the study of causality
SEM are nowadays very popular because they make it possible to (5 Cs, see Batista &
Coenders 2000):
Work with Constructs/factors/latent variables measured through indicators/observed
variables/manifest variables, and evaluate measurement quality.
Consider the true Complexity of phenomena, thus abandoning uni and bivariate
statistics.
Conjointly consider measurement and prediction, factor and path analysis, and thus
obtain estimates of relationships among variables that are free of measurement error
bias.
Introduce a Confirmatory perspective in statistical modelling. Prior to estimation, the
researcher must specify a model according to theory.
Decompose observed Covariances, and not only variances, from an interdependence
analysis perspective.
5
Basic Concepts
6
Basic Concepts
F1 causes F2
Changes in the values of the exogenous variables are not explaine by the model.
Rather, they are considered to be influenced by other factors external to the model
(background variabes such as gender, age, etc.).
Fluctuations in the endogenous variable is said to be explained by the model because
all latent variables that nfluence them are included in the model specification.
7
Statistical Modeling
Models explain how the observed and latent variables are related to one another.
Diagram
Equations
8
Types of variables
Observed variables
9
Covariance or correlation
10
Indirect relationship: both are related by an intervening variable v3
11
Factor analytic model
Factor loadings: Regression paths from the factors to the observed variables.
12
Full latent variable model
13
Example: European Social Survey (ESS)
Estonian data from ESS.
Year: 2012
Ppltrst= Most people can be trusted or you can't be too careful (0= You can't be too
careful; 10= Most people can be trusted)
Pplfair= Most people try to take advantage of you, or try to be fair (0= Most people try
to take advantage of me; 10= Most people try to be fair)
Pplhlp= Most of the time people helpful or mostly looking out for themselves (0=
People mostly look out for themselves; 10=People mostly try to be helpful)
Trstprl= Trust in country's parliament (0= Not trust at all; 10: Complete trust)
Trstplt= Trust in politicians (0= Not trust at all; 10: Complete trust)
Trstlgl= Trust in the legal system (0= Not trust at all; 10: Complete trust)
14
Sample Covariances
ppltrst pplfair pplhlp trstprl trstplt trstlgl
ppltrst 4,956
pplfair 2,698 4,980
pplhlp 2,055 2,225 5,093
trstprl 1,668 1,584 1,369 6,020
trstplt 1,404 1,300 1,103 3,977 5,080
trstlgl 1,705 1,602 1,355 4,153 3,370 6,204
Sample Correlations
ppltrst pplfair pplhlp trstprl trstplt trstlgl
ppltrst 1,000
pplfair ,543 1,000
pplhlp ,409 ,442 1,000
trstprl ,305 ,289 ,247 1,000
trstplt ,280 ,258 ,217 ,719 1,000
trstlgl ,308 ,288 ,241 ,680 ,600 1,000
15
The path model
16
Basic composition
Measurement model: relations between observed and unobserved variables. CFA:
pattern by which each measure loads on a particular factor.
Structural model: Relations between unobserved variables.
A particular latent variable directly or indirectly influence (cause) changes in the values
of certain other latent variables in the model.
Structural Model Structural Model
Measurement Model
17
Examples and basic concepts. Simple linear regression model.
Introduction to interdependence analysis
The specification of a SEM consists in a set of assumptions regarding the behaviour of
the variables involved.
Substantive part: it requires translating verbal theories into equations.
Statistical part: it is needed for the eventual estimation and testing of the model. The
assumptions regard the distribution of the variables involved.
18
Substantive assumptions:
v2=21v1+d2
Linearity.
21 : effect by how much will the expected value of v2 increase following a unit increase
in v1?
Standardized 21: by how many standard deviations will the expected value of v2
increase following a standard deviation increase in v1?
d2 collects the effect of omitted explanatory variables, measurement error in v2 and the
random and unpredictable part of v2 (disturbance).
v1 is assumed to be free of measurement error.
19
Statistical assumptions regarding the joint distribution of the sources of variation:
v1 0 11 0
N ,
d2 0 0 22
Two additional parameters: the variances of v1 (11) and d2 (22).
Bivariate normal joint distribution of v1 and d2.
Variables are mean-centred.
Uncorrelation of v1 and d2 (inclusion of all relevant variables).
If this holds, the variance of v2 can be additively decomposed into explained variance
and disturbance variance. R2 is the explained percentage.
20
In order to derive the structural equation system =() we can apply path analysis :
11 12
21 22
For a model with k observed variables, the number of distinct elements in is (k+1)k/2.
= (11, 22, 21)
11 11
21 11 21
2
22 22 21 21 22 11 21
21
It is possible to solve the system ()= as it contains an equal number of equations
(distinct elements of ) and unknowns (elements of ) exactly identified:
and estimate p 11, 22 , 21 : by solving
the system (p)=S:
11 11 11 s11
21 s
21 21 21 s
11 11
2
22 22 21
2
22 s 22 s 21
11 s11
s11 s12
S
s21 s22
22
Example
trsrprl =21*ppltrst +d2
(v2=trsrprl) can be explained by level of trust in others (v1=ppltrst):
s s
S 11 12
s 21 s 22 ( ) =( )
=
= / = 1.668/4.956= 0.34
= = 6.020 - (1.6682/4.956)= 5.46
23
21 is identical to the ordinary least squares estimation (dependence analysis).
In statistical analysis, a function of residuals (e.g. the sum of squares) is used as:
A criterion function to minimize during estimation.
ppltrst trstprl
ppltrst 4,956
trstprl 1,668 6,020
26
Notes for Model (Default model)
27
Estimates (Group number 1 - Default model)
Estimate
trstprl <--- ppltrst ,305
28
Squared Multiple Correlations: (Group number 1 - Default model)
Estimate
trstprl ,093 ### R2 ###
ppltrst trstprl
ppltrst ,000 In an exactly identified model.
trstprl ,000 ,000 they are zero as S=(p) has a solution
29
Simple Regression with SPSS: trsrprl =21*ppltrst +d2
Coeficients
Model Non standardized Satandard t Sig.
coefficients ized coef.
B s.e. Beta
(Constant) 2,093 ,129 16,238 ,000
1 Most people can be trusted ,337 ,022 ,305 15,476 ,000
or you can't be too careful
a. Variable dependent: Trust in country's parliament
30
Model with two dependent variables and an indirect effect.
Identification, goodness of fit and specification errors
v2=21v1+d2 v3=32v2+d3
31
Structural equation system:
11 11
21 11 21
22 22 21 21
31 11 21 32
32 22 32
33 33 32 32
32
AMOS OUTPUT
33
Variable Summary (Group number 1)
34
Parameter Summary (Group number 1)
35
Notes for Model (Default model)
36
Standardized Regression Weights: (Group number 1 - Default model)
Estimate
trstprl <--- ppltrst ,305
stfeco <--- trstprl ,554
Estimate
trstprl ,093
stfeco ,307
22 22 21 21 22
31 11 21 32
32 22 32
33 33 32 32
38
The existence of degrees of freedom affects estimation.
In general, no p vector of estimates will exactly satisfy (p)=S.
Estimation consists in finding a p vector that leads to an S-(p) matrix with small values.
A function of all elements in S-(p)) called fit function is minimized (Fmin)
The existence of degrees of freedom makes it possible to test the model fit. A model
with df=0 leads to a p vector that always fulfils (p)=S or S-(p)=0 and thus perfectly
fits any data set.
In a correct model with df>0 ()= in the population and (p)S in the sample. If S-
(p), contains large values, we can say that some of the restrictions are false.
If assumptions are fulfilled and under H0 (null hypothesis: the model contains all
necessary parameters), a transformation of the minimum value of the fit function
follows a 2, which makes it possible to test the model restrictions (significance of
omitted parameters). Note that standard testing procedures in statistical modelling
(e.g. t-values) test the parameters which are present in the model.
39
Specification errors
Errors such as the omission of important explanatory variables, the omission of model
parameters, or the inclusion of wrong restrictions are known as specification errors.
Specification errors are frequent. In general, a specification error can bias any
parameter estimate.
If the model is incorrect because v3 receives a direct effect from v1:
v2=21v1+d2
v3=31v1+32v2+d3
and we apply path analysis, then we observe that the new parameter affects 31 y 33:
40
11 11
21 11 21
22 22 21 21
31 11 21 32 11 31
32 22 32
33 33 32 32 31 31
If we fit the model in to the covariances in the equation, we find 31 to be affected by
the absent parameter but fitted only by the present parameters and . and
31 21 32 21
32 will be biased.
41
Attempts must be made to detect specification errors by all means, both statistical and
theoretical:
Specification errors are undetectable in any model with df=0. They are also
undetectable if they involve variables that are NOT in the model.
It can happen that many models with different interpretations have a similarly good
fit, even an exactly equal fit (equivalent models).
42
If we estimate a general model:
v1=12v2+d1 22 33
v2=21v1+23v3+d2 11
v3=32v2+d3
d1 0 11 0 0
d 2 N 0 , 0 22 0
d 0 0 0
3 33
then the parameter vector includes 7 elements =( 11, 22, 33, 12, 21, 23, 32) versus
6 (3*4/2) equations: infinite number of solutions (underidentified model).
43
Identification of the model
Just identified model (df=0): number of data variances and covariances equals the
number of parameters. The model yields an unique solution for all parameters, but
scientifically it is not interesting because without degrees of freedom it never can be
rejected. No goodness of fit of the model is possible.
Over identified model (df>0): It allows to reject the model. It allows to analyze the
discrepancy between S and (p) , thereby rendering it of scientific use. The aim in SEM
then is specify over identified models
Under identified model: infinite number of solutions. No useful at all
44
Simple regression model with errors in the explanatory variable.
Introduction to models with measurement error
The observed explanatory variable (v1) is measured with error (e1). The unobservable
error-free value f1 is called factor or latent variable.
f2 is observed because e2 is for the moment assumed to be zero.
Two equation types:
Relating factors to one another:
f2=21f1+d2
11 22
11 Relating factors to observed variables or
21 indicators:
v1=f1+e1
v2=f2
45
Assumptions:
Measurement errors are uncorrelated with factors (as in factor analysis).
Disturbances are uncorrelated with the explanatory factor (as in regression).
f1 0 11 0 0
e1 N 0 , 0 11 0
d 0 0 0
2 22
12 11 11 11 11
1 1
11 11 11
46
The structural equations become:
11 11 11
21 11 21
2
22 22 11 21
Underidentified model: 4 parameters (11, 11, 21, 22) and three variances and
covariances (only those of observed variables count).
The OLS estimator assumes that 11=0, which is a specification error and leads to bias.
The probability limit of the OLS estimator is:
s 21 21 11 21 11 11 21
21 1 21 and is thus biased unless =1.
s11 11 11 11 1
47
Simple linear regression model with multiple indicators
The solution to measurement error bias in SEM involves the use of multiple indicators,
at least of the explanatory latent variables.
The equations relating factors to indicators become:
f2=21f1+d2
v1=1* f1+e1
v2=f2
v3=31*f1+e3
48
The equation includes a loading 31 (L31) which relates the scales of f1 and v3:
The researcher must fix the latent variable scale, usually by anchoring it to the
measurement units of an indicator whose equals 1.
Standardized instead of raw loadings are usually interpreted. If there is only one
factor per indicator, they lie within -1 and +1 and equal the square root of .
f1 0 11 0 0 0
e1 0 0 11 0 0
e N 0 , 0 0 33 0
3
d 0 0 0 0 22
2
49
Addition of v3 in the structural equations. This is an exactly identified model, all of
whose parameters can be solved, even those related to unobservable variables. The
extent to which multiple indicators of the same construct converge (correlate) provides
information to estimate the parameters.
11 12 11 11 11 21 31 32
21 11 211 31 31 11
22 22 11 212 21 21 11
31 11311 11 11 11
32 11 21311 33 33 1131
2
33 11231 33 22 22 11 212
50
Applied example
f2=v2=trstprl from Social Trust f1=SocT, measured by its two indicators (v1=ppltrst and v3=pplhlp):
ppltrst = 1* SocT+ e1
pplhlp = 31* SocT+e3
trstprl=trstprl
trstprl=21 * SocT+d2
51
The equivalence between the observed and latent dependent variable makes it possible
to simplify the path and equations as:
22
11
ppltrst = 1* SocT+ e1 1
21
pplhlp = 31* SocT+e3
1
trstprl= 21 * SocT+d2 33 31
We define a latent variable called Social Trust, measured by ppltrst and pplhlp. The
loading of ppltrst (first indicator) is constrained to 1 in order to fix the scale of the latent
variable. Each indicator automatically receives a (error variance, ei) parameter.
52
AMOS OUTPUT
53
The model is recursive.
Sample size = 2330
54
Parameter Summary (Group number 1)
55
Notes for Model (Default model)
56
Estimates (Group number 1 - Default model)
57
Variances: (Group number 1 - Default model)
58
Full SEM model
Example:
Independent variables (8): Dependent variables (7):
6 errors: e1, e2, e3, e4, e5, e6 6 represent observed variables: ppltrst;
1 disturbance: d1 pplfair; pplhlp; trstprl; trstplt; trstlgl
1 latent variable: SocialTrust 1 represents an unobserved variable (or
factor Political Trust).
59
Definition of the model:
Model Equations
Political Trust = ? * SocialTrust + d1 Structural model
ppltrst = 1* SocialTrust + e1
pplfair = ? * SocialTrust + e2 Social Trust
pplhlp = ? * SocialTrust + e3 measurement scale model
trstprl = 1*Political Trust + e4
trstplt = ? * Political Trust + e5 Political trust
trstlgl =? * Political Trust + e6 measurement scale model
60
Rules for determining the model parameters
Rule 1: All the variances of the independent variables are parameters
Rule 2: All covariances between independent variables are parameters
Rule 3: All load factors between latent and its indicators are parameters
Rule 4: All regression coefficients between observed or latent variables are
parameters
Rule 5: (i) The variances of dependent variables, (ii) the covariance between
dependent variables and (iii) the covariance between dependent and independent
variables, are never parameters (are explained by other parameters of the model)
Rule 6: For each latent variable must be set its metric:
For independent latent two ways:
Set its variance set to a constant (usually 1)
Fix a load factor () between latent and its factor (usually 1)
61
Determining the model parameters:
For the latent dependent only one way: fix a coefficient between it and one of the
observed variables to a constant (usually 1)
An equation for each variable (latent or observable) that receives a one-way arrow
(dependent variables) (7)
So many variances as independent variables (8)
So many covariances as two-way arrows [0]
62
Computation of degrees of freedom (Default model)
Number of distinct sample moments: 21 = = (6 * 7/2)
Number of distinct parameters to be estimated: 13
8 variances of independent variables
4 coefficients of latent factors with indicators
1 regression coefficient
Degrees of freedom (21 - 13): 8
63
Determining the model parameters:
Adding a covariance between e5 and e6, we introduce a parameter, and we lose one
degree of freedom.
64
Confirmatory Factor Analysis CFA.
Introduction to reliability and validity assessment
11
2
11
21
22
12
33
32
42
44
65
This model does not contain equations relating factors to one another but only
covariances. All factors are exogenous. No or parameters, only , and .
At least three indicators are needed for models with one factor and two for models
with more factors.
In CFA models it is possible to standardize factors to unit variances instead of fixing a
loading to 1. Then the parameters are factor correlations.
For 2 factors and 2 indicators we have the following equations:
v1=11f1+e1 v2=21f1+e2
v3=32f2+e3 v4=42f2+e4
66
The model has df=1. 11=22=1:
11 111
2
11
21 11121
22 1221 22
f1 0 1 21 0 0 0 0
31 213211
f2 0 21 1 0 0 0 0
e 0 0 0 32 213221
0 0 0
1 N , 11
33 132 33
2
e2 0 0 0 0 22 0 0 41 214211
e
3 0 0 0 0 0 33 0 42 214221
e 0 0 0 0 0 0 1
4 44
43 42 32
44 142 44
2
67
The correlation between two indicators of the same factor depends on :
12 11 11 11 11
1 1
11 11 11
21 1121 11
2 2
21
21 1 2
11 22 11 11 21 22 11 11 21 22
2 2 2 2
and the correlation between two indicators of different factors is attenuated with
respect to the correlation between factors (effect of measurement error):
31 112132 11
2 2
32
31 21 21 1 3
11 33 2
11
11 33
2
32 11 11 32 33
2 2
A CFA model is likely to fit the data only if items of the same factor correlate highly and
higher than items of different factors. We advise researchers to carefully examine the
correlation matrix prior to fitting a CFA model.
68
Reliability for each item:
Trstprl 1= =0.735 trstlgl 2= =0.629
Correlations:
21 1 2 = =0.680
31 21 1 3 =0.862* =0.669
41 ????
32 ????
34 ????
42 ????
69
Amos Output
Variable Summary (Group number 1)
70
Sample Covariances (Group number 1)
71
Estimates (Group number 1 - Default model)
Estimate i
stfgov <--- SATCNTRY ,888
stfdem <--- SATCNTRY ,797
trstlgl <--- PolTrust ,793
trstprl <--- PolTrust ,857
72
Squared Multiple Correlations: 1-i2=i
Estimate
stfgov 0,789 1-0,789=0,211
stfdem 0,635 1-0,635=0,365
trstlgl 0,629 1-0,629=0,371
trstprl 0,735 1-0,735=0,265
Estimate
The same value than covariance. This is because variance of latent
SATCNTRY <--> PolTrust ,862
factors is fixed to 1. 11=22=1
73
Variances: (Group number 1 - Default model)
74
Purification of the measures
Total item correlation serves as a criterion for initial assessment and purification.
Various cut-off points are adopted:
0.30 by Cristobal et al.(2007)
0.40 by Loiacono et al. (2002)
0.50 by Francis and White(2002) and Kim and Stoel (2004)
Wolfinberger and Gilly (2003) are rigorous in retaining only items that
load at 0.50 or more on a factor
do not load at more than 0.50 on two factors
have an item total correlation of more than 0.40
75
Reliability and Validity
A measure is reliable to the extent that independent but comparable measures of the
same trait or construct of a given object agree. Reliability depends on how much of
the variation in scores is attributable to random or chance errors. If a measure is
perfectly reliable, XR = 0
A measure is valid if when the differences in observed scores reflect true differences
on the characteristic one is attempting to measure and nothing else, that is, XO= XT.
76
b) Reliability of a construct or Internal consistency of the scale
It allows to check the internal consistency of all indicators to measure the concept
(thoroughness with which all indicators measure the same)
Internal homogeneity of a set of items:
o Composed Reliability (CR) greater than 0.70 (Anderson and Gerbing, 1988;
Bagozzi and Yi, 1988)
o Correlation between each item and its construct >0.5 and correlations among
items from the same construct >0.3.
Cronbachs greater than 0.70 (Fornell and Larcker, 1981; Nunally and
Bernstein, 1994)
77
c) Convergent validity
the extent to which a set of items assumed to represent a construct does in fact
converge on the same construct:
o Average variance extracted (AVE - amount of variance that a construct obtains
from the indicators in relation to the amount of variance of the measurement
error) greater than 0.50 (Fornell and Larcker, 1981; Chin and Newsted, 1999;
Gounaris and Dimitriadis, 2003)
o Factor loadings greater than 0.5 (Grewal et al., 1998).
(d) Discriminant validity (when there are some scales in the model)
the extent to which measures of theoretically unrelated constructs do not correlate
with one another:
inter-factor correlations are less than the square root of the average variance
extracted (AVE) (Fornell and Larcker, 1981)
78
Numerical example:
Reliability of a construct
79
Convergent validity
1st) Factor loadings are significant and greater than 0.5
2nd) Average Variance Extracted (AVE) for each of the factors > 0.5.
Estimate
stfgov 0,789 (0.789+0.635)/2= 0.712 SATCNTRY
stfdem 0,635
trstlgl 0,629 (0.629+0.735)/2= 0.682 PolTrust
trstprl 0,735
80
Discriminant Validity of construct
Average variance extracted (AVE). For this a construct must have more variance with its
indicators than with other constructs of the model. It is when
between each pair of factors > estimated correlation between those factors
StfCntry PolTrust
StfCntry 0.843
PolTrust 0.862 0.823
sqrt(0.712)=0.832 sqrt(0.682)=0.823
Correlations:
Estimate
SATCNTRY <--> PolTrust ,862
81
Exploratory Factor Analysis (EFA)
82
Comparison with exploratory factor analysis (EFA)
f1 0 1 0 0 0 0 0
f2 0 0 1 0 0 0 0
v1=11f1+12f2+e1 e 0 0 0 11 0 0 0
v2=21f1+22f2+e2 1 N ,
e2 0 0 0 0 22 0 0
v3=31f1+32f2+e3 e
3 0 0 0 0 0 33 0
v4=41f1+42f2+e4 e 0 0 0 0 0 0 44
4
In EFA, which items measure which dimensions is the outcome, in CFA it is the input.
In EFA questionnaire items aim globally at a broad concept, in CFA each questionnaire
item is designed to tackle a specific dimension of the concept.
83
Random and systematic error. Reliability and validity assessment with
CFA
Reliability: Extent to which a measurement procedure would yield the same result
upon several independent trials under identical conditions. In other words, low random
measurement error (any systematic error would replicate). Random measurement error is
a problem for OLS regression but not for SEM with multiple indicators, because it is
accounted for by the parameters.
Validity: Extent to which a measurement procedure measures what it is intended to
measure and only what it is intended to measure, except for random measurement error.
In other words, absence of systematic error.
Assuming the validity of v, its reliability is the percentage of variance explained by f.
Always follow this golden rule:
Estimate reliability after validity has been diagnosed.
Test the specification of measurement equations in a CFA model prior to specifying
equations relating factors. Otherwise, relationships among factors might be biased
(specification errors) or even meaningless (invalidity).
84
Construct validation: Estimate a CFA model that assumes validity...
All items load on the factor they are supposed to measure (a second loading is a sign
of measuring another factor which is in the model).
No error correlations are specified (error correlations contain common unknown
variance, a sign of measuring an unknown factor which is not in the model).
....and diagnose its goodness of fit. You can never be certain of validity, but a CFA model
can help detect signs of invalidity such as:
It does not correctly reproduce the covariance matrix (additional loadings or error
correlations are needed, thus revealing mixed items, additional necessary dimensions).
Convergent invalidity.
Some variables have too low to be attributed to solely random error (convergent
invalidity).
Some factors have correlations very close to unity (discriminant invalidity).
Some factors have correlations of unexpected signs or magnitudes (nomological
invalidity).
85
Modelling stages in SEM
Verbal theories
1) SPECIFICATION
Estimable model
3) DATA COLLECTION
Exploratory data analysis. Computation of S
MODIFICATION
4) ESTIMATION
Methods to fit (p) to S
5) FIT DIAGNOSTICS
Discrepancies between (p) and S
NO
ADEQUATE?
YES
6) UTILIZATION
86
- Theory validation, validity and reliability assessment ...
Theoretical and statistical grounds
1) Specification
Formal establishment of a statistical model: set of statistical and substantive
assumptions that structure the data according to a theory.
Equations: one or two of the following systems of equations:
Relating factors or error free variables to one another (structural equations).
Relating factors to indicators with error (measurement equations).
Parameters: two types:
o Free (unknown and freely estimated).
o Fixed (known and constrained to a given value, usually 0 or 1).
The amount of the researchers prior knowledge will affect the modelling strategy:
If this knowledge is exhaustive and detailed, it will be easily translated into a model
specification. The researchers aim will simply be to use the data to estimate and
confirm or reject the model (confirmatory strategy).
If this knowledge is less exhaustive and detailed, the fixed or free character of a
number of parameters will be dubious. This will lead to a model modification process
by repeatedly going through the modelling stages (exploratory strategy).
87
Full SEM model:
88
Equations:
StfCntry = 31 * SocialTrust + d3
PolTrust = 21 * SocialTrust + d2
ppltrst = 1* SocialTrust + e1
pplfair = 21 * SocialTrust + e2
pplhlp = 31 * SocialTrust + e3 Total number of
parameters = 20
trstprl = 1*Political Trust + e4
trstplt = 52 * Political Trust + e5
trstlgl =62 * Political Trust + e6
stfeco=73 * StfCntry + e7
stfgov=83 * StfCntry + e8
stfdem=1 * StfCntry + e9
89
2) Identification
Can model parameters be derived from variances and covariances?
Identification must be studied prior to data collection
If a model is not identified:
o Seek more restrictive specifications with additional constraints (if theoretically
justifiable).
o Add more indicators or more exogenous factors.
Identification conditions
o Underidentification (df<0): infinite number of solutions that make S equal (p).
o Possibly identified (df=0): there may be a unique solution that makes S equal
(p). This type of models is less interesting in that their restrictions are not
testable.
o Possibly overidentified (df>0): there may be a unique solution that minimizes
discrepancies between S and (p). Only these models, more precisely their
restrictions, can be tested from the data.
90
Example Full SEM model:
9 observed variables lead to (910/2)=45 variances and covariances: possibly
overidentified model.
Total number of parameters = 20
Degrees of Freedom = 45-20 = 25 > 0
91
3) Data collection and exploratory analyses
Valid sampling methods
In their standard form, SEM assumes simple random sampling. Extensions to stratified
and cluster samples have been recently developed. In any case, they must be random
samples.
Sample size
Sample sizes in the 200-500 range are usually enough. Sample requirements increase:
For smaller R2 and percentages of explained variance.
When collinearity is greater.
For smaller numbers of indicators per factor (especially less than 3).
Under non normality, the required sample size is larger (in the 400-800 range).
Outlier and non-linearity detection
As before doing any other type of statistical modeling, outliers and non-linear
relationships must be detected by means of exploratory data analysis.
92
4) Estimation
First estimate the sample variances and covariances (S) and then find the best fitting
p parameter values.
A fit function related to the size of the residuals in S-(p) is minimized.
Each choice of fit function results in an alternative estimation method. One of
these choices leads to the maximum likelihood estimator (ML) which is the most
often used.
Estimation assumes that a covariance matrix is analyzed. Estimations obtained
from a correlation matrix are only correct only under very specific conditions.
93
Normality assumed: ML and GLS
Normality not assumed: ULS, Scale LS, AD
94
Examining the ordered correlation matrix
Let us look at correlations as well and spot low correlations of items measuring the
same or large correlations between items measuring different dimensions:
95
Amos output
Variable Summary (Group number 1)
96
Variable counts (Group number 1)
97
Estimates (Group number 1 - Default model)
98
Standardized Regression Weights: Variances:
99
Squared Multiple Correlations:
Estimate
StfCntry ,808
PolTrust ,818
stfeco ,644
stfgov ,799
stfdem ,640
ppltrst ,194
pplfair ,187
pplhlp ,156
trstprl ,769
trstplt ,650
trstlgl ,601
100
Implied (for all variables) Correlations (Group number 1 - Default model)
SocialTrust StfCntry PolTrust stfeco stfgov stfdem ppltrst pplfair pplhlp trstprl trstplt trstlgl
SocialTrust 1,000
StfCntry ,899 1,000
PolTrust ,904 ,813 1,000
stfeco ,722 ,803 ,653 1,000
stfgov ,803 ,894 ,727 ,717 1,000
stfdem ,719 ,800 ,650 ,642 ,715 1,000
ppltrst ,440 ,396 ,398 ,318 ,354 ,316 1,000
pplfair ,432 ,389 ,391 ,312 ,347 ,311 ,190 1,000
pplhlp ,396 ,356 ,358 ,285 ,318 ,284 ,174 ,171 1,000
trstprl ,793 ,713 ,877 ,572 ,637 ,570 ,349 ,343 ,314 1,000
trstplt ,729 ,655 ,806 ,526 ,586 ,524 ,321 ,315 ,288 ,707 1,000
trstlgl ,701 ,630 ,775 ,506 ,563 ,504 ,309 ,303 ,277 ,679 ,625 1,000
101
5) Fit diagnostics
Interpretation does not proceed until the goodness of fit has been assessed.
The fit diagnostics attempt to determine if the model is correct and useful.
o Correct model: its restrictions are true in the population. Relationships are
correctly specified without the omission of relevant parameters.
o In a correct model, the differences between S and (p) are small and random.
o Correctness must not be strictly understood. A model must be an approximation
of reality, not an exact copy of it.
o Thus, a good model will be a compromise between parsimony and
approximation.
Diagnostics will usually do well at distinguishing really badly fitting models from fairly
well fitting models. Many models will fit fairly well (even exactly equally well if
equivalent) and will be hard to distinguish statistically, they can be only distinguished
theoretically.
102
The 2 goodness of fit statistic
Null hypothesis: the model is correct, without omitted relevant parameters:
H0:=()
Sample size and power of the test are often high. Researchers are usually willing to
accept approximately correct models with small misspecifications, which are rejected
due to the high power. Quantifying the degree of misfit is more useful than testing the
hypothesis of exact fit.
103
5.1 Global diagnostics
First look for serious problems (common for small samples, very badly fitting
models, and models with two indicators per factor):
o Lack of convergence of the estimation algorithm.
o Underidentification.
Notes for Model (Default model)
The model is probably unidentified. In order to achieve identifiability, it will probably be necessary to impose 1 additional
constraint.
104
Fix negative unsignificant variances to zero:
105
The Tucker and Lewis (1973) index (TLI) and Bentlers (1990) comparative fit index
(CFI) introduce the degrees of freedom of the base (gb) and researcher (g) models to
account for parsimony. They will increase after adding parameters only if the 2
statistic decreases more substantially than g.
b2 2
gb
g ( b2 g b ) ( 2 g )
TLI CFI min ;1
b2 1 b gb
2
gb
RMSEA
max 2 g ; 0
Ng
Values below 0.05 are considered acceptable.
The sampling distribution is known, which makes it possible to do confidence
intervals and test the hypothesis of approximate fit. If both extremes of the interval
are larger than 0.05, a very bad fit can be concluded. If both extremes are below 0.05,
a very good fit can be concluded.
106
5.2. Detailed diagnostics
Are standardized estimated values reasonable and of the expected sign?
Are there significant residuals that suggest the addition of parameters? (To
estimate them in Amos Analysis properties \ Output \ Residual moments). The values
are t-values.
Each residual covariance, has been divided by an estimate of its standard error. In sufficiently large
samples, these standardized residual covariances have a standard normal distribution if the model is
correct. So, if the model is correct, most of them should be less than two in absolute value.
Are there low R2 values suggesting the omission of explanatory variables or low
values suggesting a lack of validity? (To estimate them in Amos Analysis properties \
Output \ Squared Multiple Correlations)
107
The modification index is an individual significance test of omitted parameters (Ho:
the omitted parameter is zero in the population).
o Reject hypothesis above critical 2 value with 1 df. 3.84 for type I risk 5%.
o Always consider the expected standardized estimated parameter and its sign: if
power is high, parameters of a substantially insignificant value can be statistically
significant. Only add parameters of a substantial size.
Residuals and modification indices can suggest the addition of parameters in order
to improve fit. A model can also be improved by dropping irrelevant parameters
(parsimony principle). The usual t statistic tests the significance of included
parameters (Ho: the included parameter is zero in the population).
o Non-significant disturbance covariances and measurement error covariances
should be dropped from the model. Non-significant parameters may be dropped
from the model if their theoretical argumentation is weak. Non -significant
parameters reveal invalidity.
108
Standardized Residual Covariances (Group number 1 - Default model)
109
Modification Indices
110
Model Fit Summary
RMR, GFI
Baseline Comparisons
RMSEA
111
5.3. Model modification. Capitalization on chance
Frequently models fail to pass the diagnostics.
Which modifications introduce and in which order?
o Introduce modifications one at a time, and carefully examine results before
introducing the next. One modification can modify the need for another.
o First improve fit (add parameters). Then improve parsimony (drop parameters).
o Disregard high modification indices with very small expected estimates.
o Consider models with good descriptive fit indices, even if the 2 test rejects them
(parsimony-approximation compromise).
o Avoid adding theoretically uninterpretable parameters, no matter how significant.
o Make few modifications.
o The selected model must pass the diagnostics, theoretically relevant and useful.
o Modified models can be compared with CFI and RMSEA.
112
Model modification has some undesirable statistical consequences, especially if
modifications are blindly done using only statistics, that is, without theory.
Even if model modification has been done carefully, modifications are based on a
particular sample. Have we reached a model that fits the population?
Bias of estimates and significance tests: only large and significant parameters have
been considered to be candidates for addition.
The introduction of modifications that improve the fit to the sample but not to the
population is known as capitalization on chance.
The only solution is to check that the model fits well beyond the particular sample
used:
o Crossvalidation: estimation and goodness of fit test of the model on an
independent sample of the same population. If only one sample is available, it
can be split: the first half is used for model modification and the second for
validation. Crossvalidation is successful if the model fits the second sample
reasonably well.
113
Complete Example Modeling Stages:
CFA model
114
trstprl = 1*Political Trust + e4
trstplt = 52 * Political Trust + e5
trstlgl =62 * Political Trust + e6
stfeco=73 * StfCntry + e7
stfgov=83 * StfCntry + e8
stfdem=1 * StfCntry + e9
115
The model has 6x7/2=21 variances and covariances, and 13 parameters (6 error
variances , 2 factor variances, 1 factor covariance, 4 loadings): 8 degrees of freedom.
Each factor has at least 2 pure indicators: the measurement part is identified.
In the complete model with parameters, the factors are related in a recursive system
without error covariances: it is identified.
This model only has measurement equations. The loading of the first variable in each
factor is equal to 1. The remaining loadings are free. Each observed variable also has
an error variance . By default all factor variances and covariances are free. To
constrain factors to be uncorrelated one would add the constrained parameter with a
value of 0 in the Covariance:
116
Estimation
Your model contains the following variables (Group number 1)
117
Parameter Summary (Group number 1)
118
Sample Correlations (Group number 1)
120
Squared Multiple Correlations: (Group number 1 - Default model). This is R2
Estimate
stfeco ,631 = ,7952
stfgov ,816 = ,9032
stfdem ,632 = ,7952
trstprl ,769 = ,8772
trstplt ,656 = ,8102
trstlgl ,595 = ,7712
Estimate
121
Estimate
PolTrust <--> StfCntry ,843 ### Smaller than 1 ###
122
StfCntry PolTrust stfeco stfgov stfdem trstprl trstplt trstlgl
stfeco ,795 ,670 1,000
stfgov ,903 ,761 ,718 1,000
stfdem ,795 ,670 ,632 ,718 1,000
trstprl ,739 ,877 ,587 ,667 ,587 1,000
trstplt ,682 ,810 ,542 ,616 ,542 ,710 1,000
trstlgl ,650 ,771 ,516 ,587 ,517 ,676 ,624 1,000
123
FIT DIAGNOSTICS
Model Fit Summary
CMIN
RMR, GFI
Baseline Comparisons
AIC
125
Matrices (Group number 1 - Default model
M.I. Par Change If you repeat the analysis treating the covariance
e7 <--> StfCntry 8,471 ,111
between e7 and StfCntry as a free parameter, its
e7 <--> PolTrust 12,506 -,164
e8 <--> PolTrust 5,674 ,096 estimate will become larger by approximately 0,111
e9 <--> e7 10,516 ,146 than it is in the present analysis.
e9 <--> e8 6,049 -,096 PROBLEM:
e4 <--> StfCntry 8,131 -,106
e4 <--> PolTrust 5,407 ,099 Change in covariances, the change in correlations is
e4 <--> e7 5,200 -,101 not known!
e4 <--> e9 12,892 -,164
e5 <--> e7 7,757 -,125
e5 <--> e8 22,745 ,193
e5 <--> e9 6,924 -,122
e5 <--> e4 4,317 ,088
e6 <--> StfCntry 8,911 ,135
e6 <--> PolTrust 6,250 -,134
e6 <--> e8 19,909 -,211
e6 <--> e9 62,688 ,427
e6 <--> e5 13,862 -,193
127
M.I. Par Change
stfeco <--- trstprl 4,641 -,027
stfeco <--- trstplt 6,938 -,036
stfgov <--- trstplt 11,171 ,041
stfdem <--- trstlgl 22,102 ,059
trstprl <--- stfeco 5,041 -,031
trstprl <--- stfdem 8,843 -,040
trstplt <--- trstlgl 4,962 -,027
trstlgl <--- stfdem 28,944 ,085
128
Model modification:
129
Variable Summary (Group number 1)
130
Sample Moments (Group number 1)
131
Notes for Model (Default model)
132
Estimates (Group number 1 - Default model)
Estimate
trstlgl <--- PolTrust ,765
trstplt <--- PolTrust ,812
trstprl <--- PolTrust ,881
stfdem <--- StfCntry ,787
stfgov <--- StfCntry ,910
stfeco <--- StfCntry ,793
133
Covariances: (Group number 1 - Default model)
Estimate
PolTrust <--> StfCntry ,835
e6 <--> e9 ,199
134
Squared Multiple Correlations: (Group number 1 - Default model)
Estimate
stfeco ,628
stfgov ,827
stfdem ,619
trstprl ,776
trstplt ,659
trstlgl ,585
135
Implied (for all variables) Correlations (Group number 1 - Default model)
136
Implied Correlations (Group number 1 - Default model)
137
Modification Indices (Group number 1 - Default model)
138
Regression Weights: (Group number 1 - Default model)
CMIN
139
RMR, GFI
Baseline Comparisons
RMSEA
140
References
Anderson, J.C., & Gerbing, D.W. (1988), Structural equation modeling in practice: A review and recommended two-step approach. Psychological
Bulletin, 103(3), 411423.
Bagozzi, R.P., & Yi, Y. (1988). On the evaluation of structural equation models. Journal of Academy of Marketing Science, 6(1), 7494.
Chin, W.W. and Newsted, P.R. (1999), Structural equation modeling analysis with small samples using partial least squares, in Hoyle, R.R. (Ed.),
Statistical Strategies for Small Sample Research, Sage Publications, Thousand Oaks, CA, pp. 307-41.
Churchill, G. (1979), A Paradigm for Developing Better Measures of Marketing Constructs, Journal of Marketing Research, Vol. 16 (February
1979), pp. 64-73
Fornell, C. and Larcker, D.F. (1981), Evaluating structural equation models with unobservable variables and measurement error, Journal of
Marketing Research, Vol. 18, No. 1, pp. 39-50.
Gounaris, S., Dimitriadis, S., 2003. Assessing service quality on the web: evidence from business-to-consumer portals. Journal of Services
Marketing 17 (4/5), 529548.
Grewal D, Monreo KB, Krishnan R. The effects of price-comparison advertising on buyers perceptions of acquisition value, transaction value,
and behavioral intentions. J Market 1998;62(2):4659.
Hair, J.F., Jr, Anderson, R.E., Tatham, R.L., & Black, W.C. (1999). Multivariate data analysis. London: Prentice Hall.
Hulland, J. (1999), Use of partial least squares (PLS) in strategic management research: a review of four recent studies, Strategic Management
Journal, Vol. 20, No. 2, pp. 195-204.
Ladhari, R. (2010): Developing e-service quality scales: a literature review, Journal of Retailing and Consumer Services, Vol 17. pp. 464-477.
Ledden, L., Kalafatis, S., Samouel, P. (2007), The relationship between personal values and perceived value of education, Journal of Business
Research 60 (2007) 965974
Nunnally, J.C. and Bernstein, I.H. (1994), Psychometric Theory, 3rd ed., McGraw-Hill, New York, NY.
Ribbink, D., van Riel, A., Liljander, V. and Streukens, S. (2004), Comfort your online customer: quality, trust and loyalty on the internet,
Managing Service Quality, Vol. 14, No. 6, pp. 446-456
White, J.C., Varadarajan, P.R. and Dacin, P.A. (2003), Market situation interpretation and response: the role of cognitive style, organizational
culture, and information use, Journal of Marketing, Vol. 67, No. 3, pp. 63-79.
Wolfinbarger, M., Gilly, M.C., 2003. ETailQ: dimensionalizing, measuring and predicting retail quality. Journal of Retailing 79 (3), 183198.
141
Amos Graphics
142
143
144
145
146
147
148