Structural Equation Modelling For Small Samples

Structural Equation Modelling for small samples
Michel Tenenhaus
HEC School of Management (GRECHEC),
1 rue de la Libration, Jouy-en-Josas, France [tenenhaus@hec.fr]
Abstract
Two complementary schools have come to the fore in the field of Structural Equation Modelling
(SEM): covariance-based SEM and component-based SEM.
The first approach developed around Karl Jreskog. It can be considered as a generalisation of both
principal component analysis and factor analysis to the case of several data tables connected by
causal links.
The second approach developed around Herman Wold under the name "PLS" (Partial Least
Squares). More recently Hwang and Takane (2004) have proposed a new method named
Generalized Structural Component Analysis. This second approach is a generalisation of principal
component analysis (PCA) to the case of several data tables connected by causal links.
Covariance-based SEM is usually used with an objective of model validation and needs a large
sample (what is large varies from an author to another: more than 100 subjects and preferably more
than 200 subjects are often mentioned). Component-based SEM is mainly used for score
computation and can be carried out on very small samples. A research based on 6 subjects has been
published by Tenenhaus, Pags, Ambroisine & Guinot (2005) and will be used in this paper.
In 1996, Roderick McDonald published a paper in which he showed how to carry out a PCA using
the ULS (Unweighted Least Squares) criterion in the covariance-based SEM approach. He
concluded from this that he could in fact use the covariance-based SEM approach to obtain results
similar to those of the PLS approach, but with a precise optimisation criterion in place of an
algorithm with not well known properties.
In this research, we will explore the use of ULS-SEM and PLS on small samples. First experiences
have already shown that score computation and bootstrap validation are very insensitive to the
choice of the method. We will also study the very important contribution of these methods to multiblock analysis.
Key words: Multi-block analysis, PLS path modelling, Structural Equation Modelling, Unweighted
Least Squares
Introduction
Compare to covariance-based SEM, PLS suffers from several handicaps: (1) the diffusion of path
modelling softwares is much more confidential than that of covariance-based SEM softwares, (2) the
PLS algorithm is more an heuristic than an algorithm with well known properties and (3) the
possibility of imposing value or equality constraints on path coefficients is easily managed in
covariance-based SEM and does not exist in PLS. Of course, PLS has also some advantages on
covariance-based SEM (thats why PLS exists) and we can list some of them: systematic
convergence of the algorithm due to its simplicity, possibility of managing data with a small number
of individuals and a large number of variables, practical meaning of the latent variable estimates,
general framework for multi-block analysis.
It is often mentioned that PLS is to covariance-based SEM as PCA is to factor analysis. But the
situation has seriously changed when Roderick McDonald showed in his 1996 seminal paper that he
could easily carry out a PCA with a covariance-based SEM software by using the ULS (Unweighted
Least Squares) criterion and cancelling the measurement error variances. Furthermore, the
estimation of the latent variables proposed by McDonald is similar to using the PLS mode A and the
SEM scheme (i.e. using the theoretical latent variables as inner LV estimates). Thus, it became
possible to use a covariance-based SEM software to mimic PLS.
In the first section of this paper, it is reminded how to use the ULS criterion for covariance-based
SEM and the PLS way of estimating latent variables for mimicking PLS path modelling. Then, the
second section is devoted to show how to carry out a PCA with a covariance-based SEM software
and to comment the interest of this approach for taking into account parameter constraints and for
bootstrapping. Multi-block analysis is presented in the third section as a confirmatory factor
analysis.
We have used AMOS 6.0 (Arbuckle, 2005) and XLSTAT-PLSPM, a module of the XLSTAT
software (XLSTAT, 2007), on practical examples to illustrate the paper. Listing the pluses and
minuses of ULS-SEM and PLS finally concludes the paper.
I.
Using ULS and PLS estimation methods for structural equation modelling
We describe in this section the use of the ULS estimation method applied to the SEM parameter
estimates and that of the PLS estimation method for computing the LV values.
In a first part we remind the structural equation model following Bollen (1989). A structural
equation model consists of two models: the latent variable model and the measurement model.
The latent variable model
Let be a column vector consisting of m endogenous (dependent) centred latent variables, and a
column vector consisting of k exogenous (independent) centred latent variables. The structural
model connecting the vector to the vectors and is written as
= B + +
(1)
where B is a zero-diagonal m m matrix of regression coefficients, a m k matrix of regression

coefficients and a centred random vector of dimension m.
The measurement model
Each latent (unobservable) variable is described by a set of manifest (observable) variables. The
column vector yj of the centred manifest variables linked to the dependent latent variable j can be
written as a function of j through a simple regression with usual hypotheses.
y j = yj j + j
(2)
The column vector y, obtained by concatenation of the yjs, is written as
y = y +
(3)
m
where y = yj is the direct sum of 1y ,..., my and is a column vector obtained by concatenation
j=1
of the js. It may be reminded that the direct sum of a set of matrices A1, A2,, Am is a block
diagonal matrix in which the blocks of the diagonal are formed by matrices A1, A2,, Am.
2
Similarly, the column vector x of the centred manifest variables linked to the latent independent
variables is written as a function of :
(4)
x = x +
Adding the usual hypothesis that the matrix I-B is non-singular, equation (1) can also be written as:
= (I - B) 1 ( + )
(5)
and consequently (3) becomes

y = y [(I - B)1 ( + )] +
(6)
Factorisation of the manifest variable covariance matrix
Let = Cov() = E(), = Cov() = E(), = Cov() = E() and = Cov() = E().
Suppose that the random vectors , , and are independent of each other and that the covariance
matrices , , of the error terms are diagonal. Then, we get:
xx = x x ' + ,
yy = y [(I - B) 1 (' + )][(I - B) ']1 y ' + ,
xy = x ' [ (I - B) '] y '

1
From which we finally obtain:

(7)
yy
=
xy
yx y [(I - B) 1 (' + )][(I - B) ']1 y ' +

=
xx
x ' (I - B) 1 ' y '
y [ (I - B) ] x '
x x ' +
1
Let = { x , y , B, ,, , , } be the set of parameters of the model and () the matrix (7).
Model estimation using the ULS method
Let S be the empirical covariance matrix of the MVs. The object is to seek the set of parameters
,
, B , ,,
,
,
} minimizing the criterion

= {
x
y
(8)
S ()
The aim is therefore to seek a factorisation of the empirical covariance matrix S as a function of the
parameters of the structural model. In SEM softwares, the covariance matrix estimations
n () and
n () of the residual terms are computed in such a way that the diagonal of
= Cov
= Cov
is null, even when it yields to negative variance

the reconstruction error matrix E = S ()
(Heywood case).
,
, B , ,,
, 0, 0) and by the i-th term

Lets denote by ii the i-th term of the diagonal of (
x
y
ii

. From the formula:
of the diagonal of
(9)
sii = ii + ii
we may conclude that ii is the part of the variance sii of the i-th MV explained by its LV (except in
a Heywood case) and ii is the estimate of the variance of the measurement error relative to this
MV. As all the error terms e = s ( + ) are null, this method is not oriented towards the
ii
ii
ii
ii
research of parameters explaining the MV variances. It is in fact oriented towards the reconstruction
of the covariances between the MVs, variances excluded.
The McDonald approach for parameter estimation
In his 96 paper, McDonald proposes to estimate the model parameters subject to the constraints that
,
, B , ,,
minimizing the criterion

all the ii are null. The object is to seek the parameters
x
y
(10)
,
, B , ,,
, 0, 0)

S (
x
y
The estimations of the variances of the residual terms and are integrated in the diagonal terms of
,
, B , ,,
, 0, 0) . This method is therefore oriented

the reconstruction error matrix E = S (
x
y
towards the reconstruction of the full MV covariance matrix, variances included. On a second step,
and
of the variances of the residual terms and are obtained by using again
final estimation
formula (9).
Goodness of Fit
The quality of the fit can be measured by the GFI (Goodness of Fit Index) criterion of Jreskog &
Sorbum, defined by the formula
(11)
GFI = 1
i.e. the proportion of S
,
, B , ,,
,
,
)

S (
x
y
explained by the model. By convention, the model under study is
acceptable when the GFI is greater than 0.90.

,
, B , ,,
,
,
)

The quantity S (
x
y
can be deduced from the CMIN criterion given in
AMOS:
(12)
CMIN =
N 1
,
, B , ,,
,
,
)

S (
x
y
where N is the number of cases.
In practical applications of the McDonald approach, the difference between the GFI given by AMOS
and the exact GFI computed with formula (11) will be small:
GFI = 1
(13)
,
, B , ,,
,
,
)

S (
x
y
2
2
= 1
,
, B , ,,
, 0, 0) 2

S (
ii
x
y
i
Using the McDonald approach, the GFI given by AMOS is equal to

GFI = 1
(14)
and
2
ii
/ S
,
, B , ,,
, 0, 0)

S (
x
y
S
is usually small. Furthermore, the exact GFI will always be larger than the GFI
given by AMOS.
Evaluation of the latent variables
After having estimated the parameters of the model, we now present the problem of evaluating the
latent variables. Three approaches can be distinguished: the traditional SEM approach, the
"McDonald" approach, and the "Fornell" approach. As it is usual in the PLS approach, we now
designate one manifest variable with the letter x and one latent variable with the letter , regardless
of whether they are of the dependent or independent type. The total number of latent variables is
n = k + m and the number of manifest variables related to the latent variable j is pj.
The traditional SEM approach
To construct an estimation j of j , one proceeds by multiple regression of j on the whole set of
the centred manifest variables x x ,..., x x . In other words, if one denotes as the
11
11
npn
npn
xx
implied (i.e. predicted by the structural model) covariance matrix between the manifest variables,
and as x j the vector of the implied covariances between the manifest variables x and the latent
variable j, one obtains an expression of j as a function of the whole set of manifest variables:
(15)
j = X xx1 x
where X = x11 x11 ,..., xnpn xnpn . This method is not really usable, as it is more natural to
estimate a latent variable solely as a function of its own manifest variables.
The "McDonald" approach for LV evaluation

Let x j1 ,..., x jp j be the manifest variables relative to the latent variable j. McDonald (1996) proposes
evaluating the latent variable j with the aid of the formula
(16)
j w jk ( x jk x jk )
k
n ( x , ) is the implied covariance between the MV xjk and the LV j and where
where w jk = Cov
jk
j
means that the left term is the standardized version of the right term.
The regression coefficient jk of the latent variable j in the regression of the manifest variable xjk on
the latent variable j is estimated by
(17)
n ( x , ) / Var
m ( )
jk = Cov
jk
j
j
From this, we deduce that formula (16) can also be written as

(18)
j jk ( x jk x jk )
k
The McDonald approach thus amounts to estimating the latent variable j with the aid of the first
PLS component computed in the PLS regression of the latent variable j on the manifest variables
x j1 ,..., x jp j . This approach could enter into the PLS framework. In the usual PLS approach (Wold
(1985), Tenenhaus, Esposito Vinzi, Chatelin & Lauro (2005)), under mode A, the outer weights are
obtained by simple regression of each variable xjk on the inner estimate zj of the latent variable j.. It
is necessary to calculate expressly the inner estimate zj of j to obtain these weights. Three
procedures are proposed in PLS softwares: the centroid, factorial and structural schemes. The
covariance-based SEM softwares, on the other hand, give directly the weights (loadings) that for
each xjk represent an estimation of the regression coefficient of the "theoretical" latent variable j in
the regression of xjk on j. Consequently, instead of the regression coefficient of the inner estimate
zj, the estimated regression coefficient of the "theoretical" latent variable j can be used. We have
proposed this procedure for calculating the weights based simply on the outputs of a covariancebased SEM software in Tenenhaus, Esposito Vinzi, Chatelin & Lauro (2005). We called it the
"LISREL" scheme and, without knowing it, found the choice of weights proposed by McDonald.
The "Fornell" approach
When all the coefficients jk relative to a latent variable j have the same sign and the manifest
variables are of a comparable order of magnitude, Fornell proposes building up a score taking into
account the level of the manifest variables xjk:
(19)
j = k jk x jk / k jk
This approach is standard in customer satisfaction studies.
Example 1
The following data have been collected by Jrome Pags (ENSAR-INSFA, Rennes). Six orange
juices were selected from the most well-known brands in France. Three products can be stored at
room temperature (Joker, Pampryl and Tropicana all at room temperature (r.t.)) and three others
have to be stored in refrigerated conditions (Fruivita, Pampryl and Tropicana all refrigerated (refr.)).
Table 1 provides an extract of the data. The first nine variables correspond to the physico-chemical
data, the following seven to sensorial assessments and the last 96 variables represent marks of
appreciation of the product given by students at ENSA, Rennes. These figures have already been
used in Tenenhaus, Pags, Ambroisine, Guinot (2005) to illustrate the use of PLS regression and
PLS path modeling on very small samples. In that paper, we have shown how to select a group of
homogenous judges with respect to their preferences. Only this homogenous group of judges will be
used in the present paper.
Table 1: Extract from the orange juice data file
PAMPRYL
r.t.
________
Glucose
Fructose
Saccharose
Sweetening power
pH before processing
pH after centrifugation
Titer
Citric acid
Vitamin C
Smell Intensity
Odor typicity
Pulp
Taste intensity
Acidity
Bitterness
Sweetness
Judge 1
Judge 2
Judge 3
.
.
.
Judge 96
TROPICANA
r.t.
_________
FRUIVITA
refr.
_________
JOKER
r.t.
________
TROPICANA
refr.
_________
PAMPRYL
refr.
_________
25.32
27.36
36.45
89.95
3.59
3.55
13.98
.84
43.44
2.82
2.53
1.66
3.46
3.15
2.97
2.60
2.00
1.00
2.00
17.33
20.00
44.15
82.55
3.89
3.84
11.14
.67
32.70
2.76
2.82
1.91
3.23
2.55
2.08
3.32
2.00
3.00
3.00
23.65
25.65
52.12
102.22
3.85
3.81
11.51
.69
37.00
2.83
2.88
4.00
3.45
2.42
1.76
3.38
3.00
3.00
4.00
32.42
34.54
22.92
90.71
3.60
3.58
15.75
.95
36.60
2.76
2.59
1.66
3.37
3.05
2.56
2.80
2.00
2.00
2.00
22.70
25.32
45.80
94.87
3.82
3.78
11.80
.71
39.50
3.20
3.02
3.69
3.12
2.33
1.97
3.34
4.00
4.00
3.00
27.16
29.48
38.94
96.51
3.68
3.66
12.21
.74
27.00
3.07
2.73
3.34
3.54
3.31
2.63
2.90
3.00
1.00
1.00
3.00
3.00
4.00
2.00
4.00
1.00
PLS regression makes possible to link the block Y comprising hedonic data related to the
homogenous block of judges to the block X comprising physico-chemical and sensorial data. One
may, however, wish to take into account the fact that there are actually two blocks of variables
explaining the block Y of hedonic data: block X1 comprising physico-chemical data and block X2
comprising sensorial data.
Let us assume that the sensorial variables depend on the physico-chemical variables and that the
hedonic variables in turn depend on the physico-chemical variables and the sensorial variables. We
can then construct the arrow diagram shown in figure 1. The ULS-SEM approach described above
and the PLS approach proposed by Herman Wold (Wold, 1985, Tenenhaus, Esposito Vinzi, Chatelin
& Lauro (2005)) allow this type of model to be studied.
In the arrow diagram shown in figure 1, we assume that each block of manifest variables is
summarized by a latent variable. The relationship between the manifest variables (observable) and
the latent variables (non-observable) may be formative, i.e. the function of the latent variable is to
7
summarize the manifest variables of the block. This relationship may also be reflective: each
manifest variable is then a reflection of a latent variable existing a priori, a theoretical concept one
would try to outline with measures. The formative mode does not require the blocks to be onedimensional, while that is compulsory for the reflective mode. Here, we are more in a formative
mode for the physico-chemical and sensorial blocks and reflective mode by construction for the
hedonic block. The two modes are indicated by the direction of the arrows in Figure 1.
With regard to the PLS algorithm, it is recommended that the method of calculating outer estimates
of the latent variables is selected depending on the type of relationship between the manifest
variables and their latent variables: Mode A for the reflective type and Mode B for the formative
type (Wold, 1985). The low number of products has obliged us to use Mode A to calculate the outer
estimates of the latent variables (although the mode of relationship between the manifest and latent
variables is formative for the physico-chemical and sensorial blocks).
The ULS-SEM approach presented here is clearly oriented towards the reflective mode. Therefore
this orange juice example will be analyzed with ULS-SEM and PLS approaches using the reflective
mode for the three blocks. Concerning the physico-chemical and sensorial blocks, the direction of
the arrows connecting the MVs to their LVs shown in Figure 1 should thus be inversed.
Figure 1: Theoretical model of relationships between the hedonic, physico-chemical and sensorial
data
Glucose
Fructose
Saccharose
Sweetening power
pH before processing
Titer
Citric acid
VitaminC
Smell intensity
Odor typicity
Pulp
Taste intensity
Acidity
Bitterness
Sweetness
1.
1
3
Judge 2
Judge 3
#
Judge 96
Use of ULS-SEM
We now use the ULS-SEM approach on the orange juice data. Following McDonald, the
measurement error variances are put to 0. The results are given in Figure 2 and in Table 2.
All manifest variables have been standardized. The value 1 has been given to the path coefficients
related to the manifest variables pH before centrifugation, Sweetness and Judge2.
Figure 2: AMOS 6.0 software output for the orange juice data
1
judge2
.00
e7
.00
Saccharose
.89
.92
Sweetening power
.00
.22
1.00
.96
.88
PHYSICO-CHEMICAL
1.00
1
e3
pH after centrifuga.
.00
-.88
-.06
e38
HEDONIC
e14
.00
.00
.00
.00
Odor typicity
1
e12
e11
e10
.00
.64
.93
SENSORIAL
.82
Acidity
judge68
e31
.00
e32
judge84
e33
judge86
Sweetness
e34
Judge92
1
judge96
judge91
e35
e36
e39
.00
.00
.00
judge79
1.00
.00
e30
Bitterness
e29
judge77
1
e8
e28
judge63
.70 1.06
1
e9
-.97
.00
.00
e27
judge60
.81
-.95
e26
.97
1.12
-.57
Taste intensity
judge59
.75
.86
.57
.00
e25
judge55
1.07
.66
Pulp
1
.00
.24
e13
.94
d1
.00
.59
.30
Smell intensity
e24
judge52
1.06
Vitamin C
.00
1
judge48
.90
.00
e23
1.06
.00
1
judge35
d2
Citric acid
.00
e22
judge31
.75
.03
.78
e21
.76
.22
Titer
e37
.00
-.87
1
e1
.00
1
judge30
1.04
e2
.00
e20
judge25
.97
.00
1.05
1.00
.00
e19
judge12
pH before centrifuga.
.00
e18
judge11
e4
.00
e17
judge6
-.76
e5
-.77
Fructose
.00
e16
judge3
e6
.00
Glucose
.00
.00
e15
.00
.00
.00
.00
.00
Table 2: Outputs of AMOS 6.0

Table 2.1: Regression Weights (non significant weights in bold):
Parameter
SENSORIAL
HEDONIC
HEDONIC
Glucose
Fructose
Saccharose
Sweetening power
pH before centrifug.
pH after centrifug.
Titer
Citric acid
Vitamin C
Smell intensity
Odor typicity
Pulp
Taste intensity
Acidity
Bitterness
Sweetness
Judge2
Judge3
Judge6
Judge11
Judge12
Judge25
Judge30
Judge31
Judge35
Judge48
Judge52
Judge55
Judge59
Judge60
Judge63
Judge68
Judge77
Judge79
Judge84
Judge86
Judge91
Judge92
Judge96
<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<---
PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
SENSORIAL
PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
SENSORIAL
SENSORIAL
SENSORIAL
SENSORIAL
SENSORIAL
SENSORIAL
SENSORIAL
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
Estimate
.784
.216
.643
-.765
-.764
.890
.219
1.000
.998
-.869
-.877
-.064
.244
.935
.657
-.565
-.946
-.974
1.000
1.000
.956
.879
1.051
.975
1.045
.758
.747
1.063
.896
1.060
.937
.593
1.069
.747
.855
.575
.975
1.120
.809
1.058
.702
.821
Lower (90%)
.642
-.522
.216
-1.228
-1.224
.438
-.699
1.000
.926
-1.221
-1.225
-.703
-.503
.676
-.154
-1.416
-1.483
-1.154
1.000
1.000
.471
.178
.490
.482
.562
.000
.000
.639
.397
.546
.485
-.236
.669
.000
.000
-.222
.482
.594
.026
.541
-.075
.225
Upper (90%)
1.149
.672
1.653
-.292
-.287
1.353
.894
1.000
1.057
-.371
-.402
1.044
1.028
1.278
1.057
.012
-.719
-.774
1.000
1.000
1.703
1.803
1.984
1.480
2.342
2.226
1.173
1.483
1.264
2.015
1.354
1.418
1.989
1.173
2.165
1.927
1.480
1.994
1.675
1.772
1.900
1.562
P
.010
.412
.064
.010
.010
.010
.799
...
.010
.020
.020
.818
.737
.010
.169
.112
.010
.010
...
...
.010
.018
.010
.010
.010
.070
.061
.010
.018
.010
.018
.286
.010
.061
.068
.316
.010
.020
.050
.010
.180
.040
Comments:
1) Confidence intervals are computed by bootstrapping on 200 bootstrap samples.

2) The percentile 5% of the empirical distribution of the 200 bootstrap estimates is given in column
Lower and the 95% one in column Upper.
3) P is the p-value in the following sense: the smallest interval containing the value 0 is obtained by
using a confidence interval of order 1-P.
4) The regression weights for judges 59, 77 and 92 are not significant. They appear to be the least
correlated to the first principal component of the hedonic data as it can be seen in Figure 3.
10
Tableau 2.2: Variances

Parameter
PHYSICO-CHEMICAL
d1
d2
Estimate
.921
.298
.034
Lower
.429
.028
.000
Upper
1.120
.364
.044
P
.020
.010
.177
Tableau 2.3: Squared Multiple Correlations

Parameter
SENSORIAL
HEDONIC
Estimate
.655
.946
Lower
.529
.919
Upper
.951
1.000
P
.010
.020
Tableau 2.4: Model Fit Summary

Model
Default model
NPAR
42
CMIN
105.613
Model
Default model
RMR
.175
GFI
.904
AGFI
.898
Comment: The GFI is equal to .904 and suggests that the model is acceptable.
Figure 3: Loading plot for the PCA of judges
11
PGFI
.855
The main objective of component-based SEM is the construction of scores. Following the
McDonald approach, we use the path coefficients given in Figure 2 and in Table 2.1. We obtain the
following constructs:
For the Physico-chemical block
Score(Physico-Chemical) -.765*Glucose -.764*Fructose +.890*Saccharose
+.219*(Sweetening power) + 1*(pH before centrifugation) + .998*(pH after
centrifugation) -.869*Titer -.877*(Citric acid) -.064*(Vitamin C)
where all the variables (score and manifest variables) are standardized.
For the Sensorial block
Score(Sensorial) .244*(Smell intensity) + .935*(Odor typicity) +
.657*Pulp -.565*(Taste intensity) - .946*Acidity -.974*Bitterness +
1*Sweetness
with the same standardization than for the previous block.

For the Hedonic block
Score(Hedonic) = 1*Judge2 + .956*Judge3 + + .821*Judge96
with the same standardization than for the previous blocks.

The latent variable scores are given in Table 2.5 and their correlations in Table 2.6. The correlations
between these scores and the manifest variables are given in Table 2.7.
Tableau 2.5: ULS-SEM latent variable scores
Pampryl r.t.
Tropicana r.t.
Fruivita refr.
Joker r.t.
Tropicana refr.
Pampryl refr.
Physico-chemical
-0.72
1.05
0.81
-1.54
0.56
-0.16
Sensorial
-1.26
0.43
0.87
-0.77
1.27
-0.53
Hedonic
-1.10
0.66
1.17
-0.84
0.85
-0.74
Tableau 2.6: ULS-SEM latent variable score correlation matrix

Physico-chemical
Sensorial
Hedonic
Physico-chemical
1.000
.810
.867
Sensorial
.810
1.000
.961
Hedonic
.867
.961
1.000
12
Tableau 2.7: Correlations between the ULS-SEM LV scores and the MVs
Glucose
Fructose
Saccharose
Sweetening power
pH before centrifugation
Titer
Citric acid
Vitamin C
Smell intensity
Odor typicity
Pulp
Taste intensity
Acidity
Bitterness
Sweetness
Judge2
Judge3
Judge6
Judge11
Judge12
Judge25
Judge30
Judge31
Judge35
Judge48
Judge52
Judge55
Judge59
Judge60
Judge63
Judge68
Judge77
Judge79
Judge84
Judge86
Judge91
Judge92
Judge96
Physico-chemical
-0.898
-0.898
0.926
0.078
0.950
0.939
-0.973
-0.977
-0.195
0.229
0.806
0.558
-0.401
-0.745
-0.775
0.871
0.640
0.647
0.656
0.872
0.718
0.971
0.742
0.343
0.771
0.460
0.791
0.504
0.534
0.870
0.343
0.909
0.734
0.718
0.953
0.453
0.827
0.724
0.554
Sensorial
-0.585
-0.575
0.755
0.288
0.896
0.904
-0.735
-0.740
-0.040
0.410
0.976
0.704
-0.646
-0.927
-0.951
0.967
0.928
0.756
0.662
0.785
0.929
0.817
0.518
0.693
0.936
0.837
0.840
0.878
0.592
0.854
0.693
0.670
0.473
0.929
0.934
0.685
0.845
0.419
0.679
Hedonic
-0.673
-0.673
0.817
0.242
0.947
0.946
-0.765
-0.774
-0.001
0.174
0.893
0.625
-0.552
-0.950
-0.976
0.979
0.887
0.877
0.794
0.919
0.823
0.864
0.637
0.712
0.926
0.834
0.944
0.863
0.458
0.924
0.712
0.666
0.396
0.823
0.941
0.762
0.927
0.595
0.744
The estimate of the hedonic score, shown in Table 2.5, enables us to classify the products by order of
preference:
Fruivita refr. > Tropicana refr. > Tropicana r.t. > Pampryl refr. > Joker r.t. > Pampryl r.t.
Using the significant regression weights of Table 2.1 and the correlations given in Table 2.7, we may
conclude that the physico-chemical score is correlated negatively with the fructose, glucose, titer and
citric acid characteristics and positively with the saccharose, pH before and after centrifugation
13
characteristics. The sensorial score is correlated positively with odor typicity and sweetness and
negatively with acidity and bitterness.
The hedonic score related to the homogenous group of judges is correlated positively with the
physico-chemical (.867) and sensorial scores (.961). Consequently, this group of judges likes
products with odor typicity and sweetness (Fruivita refr., Tropicana r.t., Tropicana refr.) and rejects
products with an acidic and bitter nature (Joker r.t., Pampryl refr., Pampryl r.t.). This result is
verified in Table 3.
Table 3: Sensorial characteristics of the products ranked according to the hedonic score
odor
hedonic
Product
sweeteness typicity acidity bitterness
score
_____________________________________________________________________
Fruivita refr.
3.4
2.88
2.42
1.76
1.17
Tropicana refr.
3.3
3.02
2.33
1.97
0.85
Tropicana r.t.
3.3
2.82
2.55
2.08
0.66
--------------------------------------------------------------------Pampryl refr.
2.9
2.73
3.31
2.63
-0.74
Joker r.t.
2.8
2.59
3.05
2.56
-0.84
Pampryl r.t.
2.6
2.53
3.15
2.97
-1.10
2.
Use of PLS Path modeling
For estimating the parameters of the model, we have used the module XLSTAT-PLSPM of the
XLSTAT software (XLSTAT, 2007). The variables have all been standardized. To calculate the
inner estimates of the latent variables, we have used the centroid scheme recommended by Herman
Wold (1985).
Table 4 contains the output of this modelling of the orange juice data with comments. Figure 4
includes the regression coefficients between the latent variables of the model shown in Figure 1 and
the correlation coefficients between the manifest and latent variables.
Coefficient validation
Although it gives robust and stable results with the various methods used on these orange juice data
(the same items appear to be significant in Tenenhaus, Pags, Ambroisine, Guinot (2005) and in the
present paper), we may think that bootstrap validation carried out on only 6 cases cannot be very
reliable. One reason is the following: In this example, data structure comes from the opposition
between the two groups {Fruivita refr., Tropicana r.t., Tropicana refr.} on one side and {Pampryl
refr., Joker r.t., Pampryl r.t.} on the other side. If one of these groups of products is not selected in
the bootstrap sampling selection, then the correlations between the latent variables disappear.
Maybe, non representative samples should be eliminated.
Bootstrap has been based on 200 samples and 90% confidence intervals have been asked for.
Results of bootstrap validation for the inner model are shown in Table 3.7. The confidence intervals
indicate the regression coefficients which are significant. We can also look to the usual Student t
related to the regression coefficients. By convention, a coefficient is significant if the absolute value
of t is larger than 2. In this specific example, both methods give the same results. The relationship
between the hedonic data and the physico-chemical data is not significant (t = 1.522), while that
between the hedonic data and the sensorial data is (t = 3.546). There is also a significant connection
between the physico-chemical and the sensorial data (t = 2.864).
14
Figure 4: XLSTAT-PLSPM software output for the orange juice data
15
However, the strong correlation between the hedonic data and the physico-chemical data suggests
that a PLS regression of the hedonic score should be carried out on the physico-chemical and
sensorial scores. This PLS regression (with one component) leads to the following equation:
Hedonic score = 0.49*(Physico-Chemical score) + 0.53*(Sensorial score)
2
with an R = 0.948 to be compared with R2 = 0.960 in the model shown in Figure 4.

Bootstrap validation for PLS regression yields to the same significant regression coefficients as for
OLS regression (see Table 4). If the PLS regression is validated by Jack-knife on the observed latent
variables, both coefficients are now significant (Table 4 and Figure 5).
Figure 5: XLSTAT-PLSPM software output: Validation of the PLS regression of the hedonic score
on the physico-chemical and sensorial scores
Hedonic / Standardized coefficients
(95% conf. interval)
0.7
Standardized coefficients
0.6
Senso rial
0.5
P hysico -chemical
0.4
0.3
0.2
0.1
Variable
Table 3: XLSTAT-PLSPM outputs for the orange juice example

Table 3.1: Block dimensionality
Latent variable
Physico-chemical
Dimensions
9
Critical value
1.800
Sensorial
1.400
Hedonic
23
4.600
Eigenvalues
6.213
1.410
1.046
0.317
0.013
4.744
1.333
0.820
0.084
0.019
14.655
3.663
2.199
1.837
0.646
Comment: The critical value is equal to the average eigenvalue. In this example the number of
eigenvalues is equal to 5 as the number of observations (6) is smaller than the number of variables
and because the variables are centered. Each block can be considered as unidimensional.
16
Table 3.2: Checking block dimensionality (larger correlation per row is in bold)
Variables/Factors correlations (Physico-chemical):
Glucose
Fructose
Saccharose
Sweetening power
Titer
Citric acid.
F1
0.914
0.913
-0.912
-0.035
-0.945
-0.933
0.974
0.978
F2
0.388
0.378
0.261
0.947
0.019
0.071
-0.144
-0.136
Vitamin C
0.212
-0.328
F3
-0.057
-0.083
0.286
0.319
-0.026
-0.069
0.070
0.049
0.916
F4
0.109
0.121
-0.127
-0.020
0.325
0.346
0.150
0.144
F5
-0.013
-0.050
0.049
0.017
0.006
-0.006
0.062
0.052
0.080
-0.035
Variables/Factors correlations (Sensorial):
Smell intensity
Odor typicity
Pulp
Taste intensity
Acidity
Bitterness
Sweetness
F1
F2
F3
F4
F5
0.460
0.985
0.722
-0.650
-0.913
-0.935
0.955
0.754
0.134
0.617
0.429
0.348
0.188
-0.159
-0.468
-0.058
0.298
0.626
-0.021
-0.285
0.187
0.008
0.077
-0.096
0.005
0.205
-0.028
0.161
0.004
0.041
-0.031
0.048
-0.057
0.093
0.048
Variables/Factors correlations (Hedonic):
judge2
judge3
judge6
judge11
judge12
judge25
judge30
judge31
judge35
judge48
judge52
judge55
judge59
judge60
judge63
judge68
judge77
judge79
judge84
judge86
judge91
judge92
judge96
F1
F2
F3
F4
F5
0.894
0.890
0.798
0.919
0.814
0.849
0.625
0.733
0.925
0.852
0.948
0.878
0.438
0.922
0.733
0.638
0.363
0.814
0.928
0.778
0.927
0.585
0.755
-0.218
-0.318
-0.039
0.051
0.221
0.422
0.399
-0.565
0.035
-0.474
-0.049
-0.398
0.475
0.051
-0.565
0.763
0.862
0.221
0.352
-0.410
0.043
0.348
0.307
-0.310
-0.166
-0.177
0.213
-0.177
-0.631
0.321
0.329
0.207
-0.286
0.131
0.740
-0.149
0.321
-0.063
0.235
0.213
0.115
-0.372
0.069
-0.282
-0.203
-0.018
0.522
0.278
-0.429
-0.166
-0.058
0.179
0.179
-0.039
0.068
-0.164
0.176
-0.154
0.179
-0.072
-0.169
-0.429
0.032
-0.229
0.364
0.676
0.132
0.104
-0.247
0.212
-0.243
0.205
-0.221
0.090
-0.060
-0.074
-0.115
-0.166
-0.062
0.317
0.090
-0.041
0.203
-0.243
0.012
-0.187
0.043
0.000
-0.285
-0.331
-0.430
0.231
17
Table 3.3: Model validation

Goodness of fit index:
GoF
GoF (Bootstrap)
Standard error
Critical ratio (CR)
Absolute
Relative
Outer model
0.731
0.823
0.911
0.732
0.801
0.852
0.049
0.048
0.039
14.943
17.146
23.286
Inner model
0.903
0.940
0.048
18.815
Lower bound
(90%)
Upper bound
(90%)
Minimum
1st
Quartile
Median
3rd
Quartile
Maximum
Absolute
Relative
Outer model
0.645
0.707
0.784
0.799
0.847
0.893
0.522
0.525
0.707
0.711
0.790
0.816
0.738
0.811
0.863
0.762
0.824
0.865
0.821
0.855
0.911
Inner model
0.885
0.999
0.669
0.921
0.941
0.966
1.000
Comment: Number of bootstrap samples = 200. Level of the confidence intervals: 90%
Absolute Goodness-of-Fit
1
1
Cor 2 ( x jh , j )
R 2 ( j ; i explaining j )
Nb of endogenous LV Endogenous VL
J j h
GoF =
where J = p j
j
Relative Goodness-of-Fit
pj
Cor 2 ( x jh , j )
R 2 ( j ; i explaining j )
1
1
h =1
J j
Nb of endogenous LV endogenous LV
j
j2

Outer model
where:
-
Inner model
j is the first eigenvalue computed from the PCA of block j

j is the first canonical correlation between the dependent block j and the
concatenation of all the blocks i explaining the dependent block j.
18
Table 3.4: Latent Variable validation

Cross-loadings (Monofactorial manifest variables):
Glucose
Fructose
Saccharose
Sweetening power
Titer
Citric acid.
Vitamin C
Smell intensity
Odor typicity
Pulp
Taste intensity
Acidity
Bitterness
Sweetness
judge2
judge3
judge6
judge11
judge12
judge25
judge30
judge31
judge35
judge48
judge52
judge55
judge59
judge60
judge63
judge68
judge77
judge79
judge84
judge86
judge91
judge92
judge96
Comment:
-
Physicochemical
-0.889
-0.889
0.931
0.099
0.952
0.942
-0.972
-0.977
-0.194
0.236
0.814
0.574
-0.397
-0.751
-0.784
0.877
0.646
0.654
0.665
0.873
0.729
0.972
0.750
0.349
0.777
0.470
0.801
0.517
0.533
0.872
0.349
0.910
0.727
0.729
0.957
0.467
0.831
0.724
0.559
Sensorial
-0.584
-0.574
0.758
0.294
0.896
0.905
-0.738
-0.743
-0.045
0.411
0.977
0.709
-0.639
-0.925
-0.952
0.968
0.925
0.755
0.667
0.785
0.930
0.817
0.524
0.690
0.936
0.835
0.843
0.876
0.593
0.853
0.690
0.673
0.474
0.930
0.935
0.685
0.846
0.424
0.677
Hedonic
-0.689
-0.689
0.832
0.242
0.955
0.954
-0.789
-0.798
-0.023
0.199
0.904
0.637
-0.549
-0.942
-0.972
0.982
0.880
0.860
0.787
0.916
0.834
0.879
0.648
0.689
0.926
0.815
0.938
0.847
0.479
0.924
0.689
0.695
0.432
0.834
0.953
0.742
0.925
0.602
0.731
Sweetening power and Vitamin C are not correlated to their block.

Smell intensity is not correlated to its own block
Judges 59 and 77 are weakly correlated to their block.
19
Table 3.5: Latent Variable weights (non significant weights are in bold)
Latent variable
Physico-chemical
Sensorial
Hedonic
Comment:
-
Manifest variables
Outer
weight
Outer
weight
(Bootstrap)
Standard
error
Critical
ratio (CR)
Lower
bound
(90%)
Upper
bound
(90%)
Glucose
Fructose
Saccharose
Sweetening power
Titer
Citric acid.
Vitamin C
Smell intensity
Odor typicity
Pulp
Taste intensity
Acidity
Bitterness
Sweetness
judge2
judge3
judge6
judge11
judge12
judge25
judge30
judge31
judge35
judge48
judge52
judge55
judge59
judge60
judge63
judge68
judge77
judge79
judge84
judge86
judge91
judge92
-0.124
-0.123
0.154
0.052
0.180
0.180
-0.148
-0.150
-0.007
0.052
0.206
0.145
-0.113
-0.203
-0.210
0.223
0.059
0.053
0.050
0.062
0.062
0.067
0.048
0.039
0.064
0.049
0.062
0.052
0.042
0.065
0.039
0.059
0.045
0.062
0.071
0.043
0.063
0.043
-0.113
-0.111
0.140
0.038
0.159
0.159
-0.137
-0.139
-0.010
0.033
0.190
0.114
-0.107
-0.190
-0.197
0.208
0.056
0.051
0.047
0.059
0.058
0.063
0.040
0.036
0.062
0.047
0.061
0.049
0.036
0.060
0.036
0.051
0.037
0.058
0.066
0.039
0.059
0.038
0.050
0.049
0.041
0.093
0.021
0.020
0.017
0.016
0.085
0.099
0.039
0.082
0.080
0.045
0.034
0.036
0.012
0.015
0.018
0.019
0.013
0.013
0.020
0.023
0.012
0.016
0.012
0.014
0.026
0.016
0.023
0.018
0.027
0.013
0.013
0.020
0.018
0.028
-2.491
-2.498
3.720
0.560
8.460
9.073
-8.808
-9.127
-0.077
0.527
5.255
1.759
-1.406
-4.472
-6.136
6.148
4.740
3.435
2.749
3.268
4.605
5.247
2.364
1.695
5.530
3.047
5.320
3.717
1.642
4.065
1.695
3.305
1.649
4.605
5.420
2.168
3.499
1.554
-0.147
-0.144
0.096
-0.115
0.121
0.121
-0.169
-0.171
-0.126
-0.136
0.143
-0.065
-0.196
-0.227
-0.240
0.143
0.043
0.034
0.023
0.048
0.037
0.051
0.000
0.000
0.050
0.020
0.049
0.029
-0.021
0.047
0.000
0.000
-0.021
0.037
0.055
-0.007
0.050
-0.007
-0.057
-0.057
0.180
0.151
0.184
0.185
-0.111
-0.113
0.130
0.166
0.241
0.207
0.069
-0.143
-0.143
0.267
0.064
0.065
0.071
0.080
0.083
0.078
0.065
0.066
0.075
0.064
0.082
0.061
0.066
0.074
0.066
0.072
0.065
0.083
0.084
0.057
0.074
0.081
judge96
0.046
0.044
0.020
2.345
0.013
0.066
Sweetening power and Vitamin C are not significant in block physico-chemical

Smell intensity, Pulp and Taste intensity are not significant in block sensorial
Judges 59, 77, 86 and 92 are not significant in block hedonic.
20
Table 3.6: Correlations between MV and LV

Correlations:
Latent variable
Physico-chemical
Sensorial
Hedonic
Comments:
-
Manifest variables
Standardized
loadings
Communalities
Redundancies
Standardized
loadings
(Bootstrap)
Standard
error
Glucose
Fructose
Saccharose
Sweetening power
pH before
centrifugation
Titer
Citric acid.
Vitamin C
Smell intensity
Odor typicity
Pulp
Taste intensity
Acidity
Bitterness
Sweetness
judge2
judge3
judge6
judge11
judge12
judge25
judge30
judge31
judge35
judge48
judge52
judge55
judge59
judge60
judge63
judge68
judge77
judge79
judge84
judge86
judge91
judge92
-0.889
-0.889
0.931
0.099
0.790
0.790
0.867
0.010
-0.850
-0.847
0.876
0.101
0.226
0.227
0.268
0.591
0.952
0.942
-0.972
-0.977
-0.194
0.411
0.977
0.709
-0.639
-0.925
-0.952
0.968
0.880
0.860
0.787
0.916
0.834
0.879
0.648
0.689
0.926
0.815
0.938
0.847
0.479
0.924
0.689
0.695
0.432
0.834
0.953
0.742
0.925
0.602
0.906
0.887
0.946
0.954
0.038
0.169
0.954
0.503
0.408
0.856
0.907
0.936
0.774
0.740
0.619
0.840
0.695
0.773
0.419
0.475
0.858
0.664
0.879
0.717
0.230
0.853
0.475
0.483
0.186
0.695
0.909
0.551
0.856
0.363
0.113
0.641
0.338
0.274
0.575
0.609
0.629
0.743
0.710
0.594
0.806
0.667
0.742
0.403
0.456
0.824
0.638
0.844
0.688
0.221
0.820
0.456
0.464
0.179
0.667
0.873
0.529
0.822
0.348
0.968
0.964
-0.950
-0.956
-0.203
0.285
0.940
0.612
-0.589
-0.915
-0.949
0.967
0.859
0.828
0.736
0.885
0.852
0.896
0.570
0.619
0.923
0.785
0.936
0.809
0.447
0.893
0.619
0.685
0.426
0.852
0.943
0.670
0.895
0.531
0.078
0.073
0.066
0.063
0.435
0.497
0.110
0.382
0.404
0.206
0.087
0.069
0.182
0.236
0.254
0.230
0.115
0.159
0.284
0.340
0.147
0.233
0.130
0.215
0.406
0.214
0.340
0.258
0.404
0.115
0.137
0.308
0.215
0.411
judge96
0.731
0.535
0.514
0.709
0.281
Standardized loading = correlation

Communality = squared correlation
Redundancy = Communality*R2(Dep. LV; Explanatory related LVs)
21
Table 3.6: Correlations between MV and LV (continued)

Correlations:
Latent variable
Manifest variables
Glucose
Fructose
Saccharose
Sweetening power
Physico-chemical pH before centrifugation
Titer
Citric acid.
Vitamin C
Smell intensity
Odor typicity
Pulp
Sensorial
Taste intensity
Acidity
Bitterness
Sweetness
judge2
judge3
judge6
judge11
judge12
judge25
judge30
judge31
judge35
judge48
judge52
Hedonic
judge55
judge59
judge60
judge63
judge68
judge77
judge79
judge84
judge86
judge91
judge92
judge96
Critical ratio
(CR)
Lower bound
(90%)
Upper bound
(90%)
-3.938
-3.913
3.476
0.167
12.199
12.966
-14.729
-15.413
-0.445
0.827
8.912
1.856
-1.580
-4.495
-10.967
13.953
4.832
3.642
3.102
3.993
7.258
5.541
2.277
2.030
6.293
3.493
7.234
3.942
1.179
4.315
2.030
2.694
1.069
7.258
6.934
2.411
4.301
1.465
-0.995
-0.998
0.729
-0.860
0.925
0.899
-1.000
-1.000
-0.970
-0.684
0.763
-0.174
-0.998
-1.000
-1.000
0.940
0.639
0.469
0.347
0.773
0.648
0.742
0.000
0.000
0.825
0.408
0.858
0.425
-0.426
0.716
0.000
0.000
-0.426
0.648
0.911
-0.093
0.783
-0.192
-0.569
-0.589
0.996
0.950
1.000
1.000
-0.860
-0.872
0.656
0.930
0.999
0.998
0.203
-0.752
-0.897
1.000
0.981
0.999
0.982
0.994
0.999
0.997
0.936
0.991
0.997
0.997
0.998
0.996
0.948
0.997
0.991
0.997
0.896
0.999
0.998
0.992
0.986
0.982
2.602
0.173
0.997
Comment: (identical with those for weights)

Sweetening power and Vitamin C are not significant in block physico-chemical
Smell intensity, Pulp and Taste intensity are not significant in block sensorial
Judges 59, 77, 86 and 92 are not significant in block hedonic.
22
Table 3.7: Inner model

R (Sensorial):
R
0.672
R(Bootstrap)
0.791
Standard
error
0.157
Critical ratio (CR)

4.276
Lower bound
(90%)
0.588
Upper bound (90%)

0.999
Path coefficients (Sensorial):

Latent variable
Physico-chimique
Latent variable
Value
Standard error
Pr > |t|
Value(Bootstrap)
0.820
0.286
2.864
0.046
0.835
Standard
error(Bootstrap)
Critical ratio
(CR)
Lower bound
(90%)
Upper bound (90%)
0.308
2.660
0.757
0.994
Physico-chemical
Comment:
The usual Student t test and the bootstrap approach give here the same results.
R (Hedonic):
R
0.960
R(Bootstrap)
0.986
Standard
error
0.017
Critical ratio (CR) Lower bound (90%) Upper bound (90%)

58.017
0.947
1.000
Path coefficients (Hedonic):

Latent variable
Value
Standard error
Pr > |t|
Physico-chemical
0.306
0.201
1.522
0.225
Sensorial
0.713
0.201
3.546
0.038
Latent variable
Physico-chemical
Sensorial
Value(Bootstrap)
Standard
error(Bootstrap)
Critical ratio
(CR)
Lower bound
(90%)
Upper bound
(90%)
0.331
0.651
0.698
0.674
0.438
1.058
-0.642
0.000
1.000
1.397
Comment:
The usual Student t test and the bootstrap approach give here the same results. But the nonsignificance of the physico-chemical can also be due to a multicolinearity problem. PLS regression
for estimating the structural regression equations can be used and is presented in Table 4.
23
Table 3.8: Impact and contribution of the variables to Hedonic

Impact and contribution of the variables to Hedonic:
Sensorial
0.964
0.713
0.688
71.612
71.612
Correlation
Path coefficient
Correlation * path coefficient
Contribution to R (%)
Cumulative %
Physico-chemical
0.891
0.306
0.273
28.388
100.000
Impact and contribution of the variables to Hedonic

0.8
100
80
Path coefficients
0.6
0.5
60
0.4
40
0.3
0.2
20
Contribution to R (%)
0.7
0.1
0
0
Senso rial
P hysico -chemical
Latent variable
Path coefficient
Cumulative %
Comment:
R 2 (Y ; X 1 ,..., X k ) = Cor (Y , X j ) * j
When all the terms Cor (Y , X ) * are positive, it makes sense to compute the
j
relative contribution of each explanatory variable Xj to the R square.
24
Table 3.9: Model assessment
Latent variable
Type
Physico-chemical
Sensorial
Hedonic
Exogenous
Endogenous
Endogenous
Mean
Adjusted
R
0.000
0.000
0.000
0.672
0.960
0.672
0.950
Mean
Weighted Mean
Communalities
(AVE)
Mean Redundancies
0.687
0.676
0.634
0.454
0.609
0.654
0.532
0.816
Comments:
The weighted mean takes into account the number of MVs in each block.
(Absolute GoF)2 = (Mean R2)*(Weighted mean communalities)
Table 3.10: Correlation between the latent variables

Correlations (Latent variable):
Physico-chemical
Sensorial
Hedonic
Physico-chemical
Sensorial
1.000
0.820
0.820
1.000
0.891
0.964
Hedonic
0.891
0.964
1.000
Table 3.11: Direct, indirect and total effects

Direct effects (Latent variable):
Physico-chemical
Physico-chemical
Sensorial
0.820
Hedonic
0.306
Sensorial
Hedonic
0.713
Comment :
Sensorial = .820*Physico-chemical
Hedonic = .306*Physico-chemical + .713*Sensorial
25
Table 3.11: Direct, indirect and total effects (continued)

Indirect effects (Latent variable):
Physico-chemical
Physico-chemical
Sensorial
0.000
Hedonic
0.585
Sensorial
Hedonic
0.000
Comment :
Hedonic = .306*Physico-chemical + .713*.820*Physico-chemical

Indirect effect of Physico-chemical on Hedonic = .713*.820 = 0.585
Total effects (Latent variable):
Physico-chemical
Sensorial
Hedonic
Physico-chemical
Sensorial
0.820
0.891
0.713
Hedonic
Comment :
Hedonic = .306*Physico-chemical + .713*.820*Physico-chemical

= .891*Physico-chemical
Table 3.12: Discriminant validity

Discriminant validity (Squared correlations < AVE) :
Physico-chemical
Sensorial
Hedonic
Mean Communalities (AVE)
Physico-chemical
1
0.672
0.793
Sensorial
0.672
1
0.930
Hedonic
0.793
0.930
1
0.687
0.676
0.634
Comment: Due to non significant MVs, the AVE criterion is too small for the three LVs.
26
Table 3.13: Latent variable score

Summary statistics / Latent variable scores:
Observations
Minimum
Maximum
Mean
Std.
Deviation
Physico-chemical
Sensorial
6
6
-1.680
-1.381
1.120
1.378
0.000
0.000
1.000
1.000
Hedonic
-1.203
1.253
0.000
1.000
Variable
Latent variable scores :

Physico-chemical
Sensorial
Hedonic
-0.810
1.120
0.917
-1.680
0.630
-0.176
-1.381
0.462
0.964
-0.852
1.378
-0.570
-1.203
0.742
1.253
-0.991
0.946
-0.747
pampryl r. t.
tropicana r. t.
fruvita refr.
joker r. t.
tropicana refr.
pampryl refr.
Table 4: PLS regression of Hedonic score on Physico-chemical and sensorial scores

Goodness of fit statistics (Variable Hedonic):
R
0.948
Bootstrap validation
Latent variable
Value
Value
(Bootstrap)
Standard error
(Bootstrap)
Critical ratio
(CR)
Lower bound
(90%)
Upper bound
(90%)
Physico-chemical
Sensorial
0.490
0.531
0.267
0.744
0.408
0.402
1.201
1.320
-0.422
0.103
0.893
1.397
Jack-knife validation on the observed latent variables

Standardized coefficients (Variable Hedonic):
Variable
Physico-chemical
Sensorial
Coefficient
0.490
Std.
deviation
0.021
Lower bound
(95%)
0.449
Upper bound
(95%)
0.531
0.531
0.022
0.488
0.573
27
3.
Comparison between PLS, ULS-SEM and PCA
Comparison between weights
When we compare the weight confidence intervals computed with PLS (Table 3.5) with those
coming from ULS-SEM (Table 2.1), we find that both methods yield to the same non significant
weights with only one exception for Judge 86 (non significant for PLS and significant for ULSSEM). These weights are compared in Figure 6.
Figure 6: Comparison between the PLS and ULS-SEM weights
28
Comparison between PLS and ULS-SEM scores
The scores coming from PLS and ULS-SEM are compared in Figure 7. They are highly correlated.
This confirms our previous findings and a general remark of Noonan and Wold (1982) on the fact
that the final outer LV estimates depend very little on the selected scheme of calculation of the inner
LV estimates.
Figure 7: Comparison between the PLS and ULS-SEM scores
29
Comparison between the PLS and ULS-SEM scores and the block principal components
The correlations between the PLS and USL-SEM scores with the block principal components are
given in Table 5.
Table 5: Correlation between the PLS and ULS-SEM scores and the block principal components
ULS-SEM scores
.999
.998
.999
st
Physico-chemical 1 PC
Sensorial 1st PC
Hedonic 1st PC
PLS scores
.997
.998
.997
We may conclude that ULS-SEM, PLS and principal component analysis are giving practically the
same scores on this orange juice example.
II.
Exploratory factor analysis, ULS-SEM and PCA
If the structural model is limited to one standardized latent variable (or common factor) described
by a vector x composed of p centred manifest variables, one gets the decomposition
(20)
x = x +
It is usual to add the following hypotheses:
(21)
E () = 0,
= Cov() = E ( ') is diagonal
Cov( , ) = 0
Under these hypotheses, the covariance matrix of the random vector x is written as
(22)
= E (xx ') = x 'x +
The parameters x and in model (22) can now be estimated using the ULS method. This means
, minimizing the criterion
searching for the parameters x and
(23)
)
S ( x 'x +
where S is the matrix of empirical covariances. To remove the indetermination on the global sign of
the vector x (if x is a solution, then - x is also a solution), the solution can be chosen to make the
sum of the coordinates positive. This is the option chosen in AMOS 6.0.
The advantage of the ULS method over the other more frequently used GLS (Generalized Least
Squares) or ML (Maximum Likelihood) methods lies in its ability to function with a singular
covariance matrix S, particularly in situations where the number of observations is less than the
number of variables.
30
The quality of the fit is measured by the GFI written here as
GFI = 1
(24)
)
S ( x 'x +
Principal component analysis (PCA) is found again if one imposes the additional condition
=0
(25)
In this case, one seeks to minimise the criterion

S x 'x
(26)
The vector x is now equal to
1 u1 , where u1 is the normed eigenvector of the covariance matrix
S associated with the largest eigenvalue 1.

For each MV xj, the explained variance (or communality) is therefore jj = 1u12j . The residual
variance (or specificity) j is then estimated by j = s jj 1u12j .
The quality of the fit can still be measured by the GFI:
2
S - x 'x ( s jj jj )
2
GFI = 1
(27)
j
2
The square of the norm of S is equal to the sum of the squared eigenvalues h of S. In PCA,
2
S - ' is equal to the sum of the squares of the p-1 last eigenvalues of S. Consequently, in PCA
x
one obtains
(28)
GFI =
12 + ( s jj 1u12j )
h =1
2
h
Moreover, SEM softwares allow the computation of confidence intervals for parameters by
bootstrapping. They also allow criterion (26) to be minimised by imposing value constraints or
equality constraints on the coordinates of the vector x . We can continue to use criterion (27) to
measure the quality of the model.
31
Link between ULS-SEM, Factor Analysis, PLS and Principal Component Analysis
A central point in PLS path modelling concerns the relation between the MVs related to one LV and
this LV.
Reflective mode
The reflective mode is common to PLS and SEM. In this mode, each MV is related to its LV by a
simple regression:
(29)
x j = j + j
This model corresponds to the usual one-dimension factor analysis (FA) model. Minimization of
criterion (23) allows the estimation of the parameters of this model. As the diagonal terms of the
) are automatically null, the path coefficient j are computed with the
residual matrix S ( x 'x +
objective of reconstruction of the covariance matrix terms outside the diagonal. The average
variance extracted (AVE), defined by jj / s jj , measures the summary power of the LV. It is
j
not the first objective in this approach. It is an a posteriori value of the model.
In a one block of variables situation, it is natural to estimate the LV using the first principal
component of the MVs. The minimization of criterion (26) yields to this solution. Furthermore, the
diagonal terms of the residual S x 'x are now taken into account in the minimization. The path
coefficients j are now computed with the objective of reconstruction of the whole covariance matrix
terms, diagonal included. The AVE still measures the summary power of the LV. But it is now a
part of the objective in this approach. Consequently, in the ULS-SEM context, PCA can be obtained
by considering the FA model (22) and then by cancelling in a first step the residual measurement
variances.
In PLS path modelling softwares, the one block situation has been implemented. In this situation,
the outer estimate of the block LV is also taken as the inner estimate. Therefore, Mode A yields to
the following equation:
(30)
Cov( x j , ) x j
j
The PLS algorithm will converge to the first principal component of the block of MVs, solution of
equation (30).
Formative mode
The formative mode is easy to implement in PLS. In this mode, each LV is related to its MVs by a
multiple regression:
(31)
= jxj +
j
But in a one block situation, it is an indeterminate problem.
32
Conclusion
The residual sum of squares (RESS), defined by
(s
ij
i< j
ij ) , is smaller for FA than for PCA. On

2
the other hand the AVE is larger for PCA than for FA.
Example 2
We use data on the cubic capacity, power output, speed, weight, width and length of 24 car models
in production in 2004 given in Tenenhaus (2007). We compare on these data FA and PCA with
respect to the RESS and AVE criterions. The analyses are carried out on standardized variables. The
correlation matrix is given in Table 6.
Table 6: Car example: Correlation matrix
Capacity
Power
Speed
Weight
Width
Length
Capacity Power
1
0.954
1
Speed Weight Width Length

0.885 0.692 0.706 0.664
0.934 0.529 0.730 0.527
1
0.466 0.619 0.578
1
0.477 0.795
1
0.591
1
The path models for one-dimension FA and PCA are given in Figure 8. The common factor is
denominated as F1. The implied covariance matrices and the residual matrices produced by AMOS
are given in Table 7.
FA
PCA
.02
1
Capacity
.00
1
Capacity
e1
e1
.00
.14
1
Power
.99
F1
Speed
.87
e3
.53
Weight
F1
e4
e2
.00
.92
1.00
.68
Power
.96
.25
.93
1.00
e2
Speed
.89
e3
.00
.76
Weight
e4
.80
.74
.00
.45
.73
Width
.80
e5
Width
Length
e5
.00
.47
1
Length
e6
Figure 8 : Path model for FA and PCA
33
e6
Table 7: Car example: Implied covariance matrices and residuals produced by AMOS
FA
Capacity
Power
Implied
Speed
correlations
Weight
Width
Length
Residuals
PCA
Capacity
Power
Speed
Weight
Width
Length
Capacity
Power
Implied
Speed
correlations
Weight
Width
Length
Residuals
Capacity
Power
Speed
Weight
Width
Length
Capacity
1
Power
.918
1
Speed
.860
.804
1
Weight
.678
.633
.593
1
Width
.737
.689
.645
.508
1
Capacity
0
Power
0.036
0
Speed
0.025
0.130
0
Weight
0.014
-0.104
-0.127
0
Width
-0.031
0.041
-0.026
-0.031
0
Capacity
.926
Power
.889
.853
Speed
.853
.818
.785
Weight
.738
.699
.671
.573
Width
.771
.740
.710
.606
.642
Capacity
0.074
0
0
0
0
0
Power
0.065
0.147
0
0
0
0
Speed
0.032
0.116
0.215
0
0
0
Weight
-0.046
-0.170
-0.205
0.427
0
0
Width
-0.065
-0.010
-0.091
-0.129
0.358
0
Length
.722
.674
.632
.498
.541
1
Length
-0.058
-0.147
-0.054
0.297
0.050
0
Length
.765
.734
.705
.602
.637
.632
Length
-0.101
-0.207
-0.127
0.193
-0.046
0.368
The comparison between FA and PCA results is shown in Table 8.

Table 8: Comparison between FA and PCA approaches
RESS
.169
.230
FA
PCA
AVE
.690
.735
GFI
.983
.978
For PCA, the GFI produced by AMOS has to be modified according to formula (27). The usual
PCA of standardized data results in the following eigenvalues: 4.4113, .8534, .4357, .2359, .0514
is therefore measured by the
and .0124. The quality of the approximation of S by 1u1u1' +
following value of the GFI:
(32)
GFI =
12 + ( s jj 1u12j )
j
h =1
2
h
34
19.459 + .519
= .978
20.436
We then used AMOS 6.0 to carry out a first-order PCA of these standardized data under the
hypothesis of equality of weights for the engine variables "cubic capacity, power, speed" and
similarly equality of weight for the passenger compartment variables "weight, width, length".
Figure 9 shows the results of this estimation and Table 9 the 90% bootstrap confidence intervals.
The bootstrap intervals contain values greater than 1 because the bootstrap samples no longer consist
of standardized variables.
.00
1
Capacity
e1
.00
Power
.92
e2
.92
.00
1
Speed
.92
e3
1.00
.00
F1
.78
Weight
e4
.78
.00
.78
Width
e5
.00
Length
e6
Figure 9 : PCA under constraints on the "Auto 2004" data (AMOS 6.0 output)
Table 9: PCA under constraints for the "Auto 2004" data (AMOS 6.0 output)
Estimation and bootstrap confidence interval for the coordinates of x
Parameter
Capacity - F1
Power
- F1
Speed
- F1
Weight - F1
Width
- F1
Length - F1
Estimate
.924
.924
.924
.784
.784
.784
Inf (90%)
.542
.542
.542
.555
.555
.555
Sup (90%)
1.195
1.195
1.195
1.003
1.003
1.003
The GFI for the model with constraints has the following value provided by AMOS:
(33)
GFI * = 1
S - x 'x
S
= .9505
35
Using the modified formula yields to:

GFI = GFI * +
(34)
(s
jj
x2j
= .9505 +
.509
= .975
20.436
The very slight reduction of the GFI (.975 vs .978) means that one can accept the model with
constraints.
In this example, we obtain the component as the "McDonald" estimation of the factor ,
calculated as follows:
.924(capacity* + power* + speed*) + .784(weight* + length* + width*)

where the asterisk means that the variable is standardized.
III.
Confirmatory factor analysis, ULS-SEM and analysis of multi-block data
We assume now that the random column vector x breaks down into J blocks of random vectors
x j = ( x j1 ,..., x jp j ) ' . A specific model with one standardized latent variable (and usual hypotheses) is
constructed for each block xj:
(35)
x j = j j + j , j = 1,..., J
J
This model is similar to model (4) with x = j . For each block j we have
j=1
(36)
x j = j 'j + j
and for two blocks j and k we get

(37)
x j xk = jk j 'k
where jk = Cor ( j , k ) .
Decomposition (7) thus becomes
(38)
= j j ' +
The parameters 1,, J, and in model (38) can now be estimated by using the ULS method.
minimizing the criterion
and
This means seeking the parameters 1 ,..., J ,
(39)

( ) '+
S ( j )
j
Adding constraint (25) gives a new criterion to be minimized:

(40)
( ) '
S ( j )
j
36
This results in a new factorisation of the covariance matrix allowing an estimation to be made of
both the loadings and also the correlations between the factors. The quality of the fit is still
measured by the GFI criterion.
Example 3
In this example we are going to study data about wine tasting described in detail in Pags, Asselin,
Morlat & Robichet (1987).
Description of the data
A collected of 21 red wines of Bourgueil, Chinon and Saumur appellations is described by a set of
27 taste variables divided into 4 blocks:
X1 = Smell at rest
Rest1 = smell intensity at rest, Rest2 = aromatic quality at rest, Rest3 = fruity note at rest, Rest4 =
floral note at rest, Rest5 = spicy note at rest
X2 = View
View1 = visual intensity, View2 = shading (from orange to purple), View3 = surface impression
X3 = Smell after shaking

Shaking1 = smell intensity, Shaking2 = smell quality, Shaking3 = fruity note, Shaking4 = floral
note, Shaking5 = spicy note, Shaking6 = vegetable note, Shaking7 = phenolic note, Shaking8 =
aromatic intensity in mouth, Shaking9 = aromatic persistence in mouth, Shaking10 = aromatic
quality in mouth
X4 = Tasting
Tasting1 = intensity of attack, Tasting2 = acidity, Tasting3 = astringency, Tasting4 = alcohol,
Tasting5 = balance (acidity, astringency, alcohol), Tasting6 = mellowness, Tasting7 = bitterness,
Tasting8 = ending intensity in mouth, Tasting9 = harmony
These data have already been analysed using PLS in Tenenhaus & Esposito Vinzi (2005) and in
Tenenhaus & Hanafi (2007). We present here the ULS-SEM solution on the standardized variables
with cancellation of the residual measurement variances. First of all, we present the PCA for each
separate block in Table 10.
37
Table 10: Principal component analysis of each block for the "Wine" data
Smell at rest
Smell intensity at rest

Aromatic quality at rest
Fruity note at rest
Floral note at rest
Spicy note at rest
Component
1
2
.741
.551
.915
-.144
.854
-.191
.345
-.537
.077
.933
View
1
Visual intensity
Shading (from orange to purple)
Surface impression
Component
2
.986
-.146
.983
-.163
.947
.320
Smell after shaking
Smell intensity
Smell quality
Fruity note
Floral note
Spicy note
Vegetable note
Phelonic note
Aromatic intensity in mouth
Aromatic persistence in mouth
Aromatic quality in mouth
Component
1
2
.472
.743
.881
-.180
.819
-.176
.328
-.500
.089
.746
-.635
.593
.370
.633
.895
.277
.888
.307
.882
-.372
Tasting
Intensity of attack
Acidity
Astringency
Alcohol
Balance (acidity, astringency, alcohol)
Mellowness
Bitterness
Ending intensiry in mouth
Harmony
38
Component
1
2
.937
.082
-.257
.691
.775
.427
.774
.378
.844
-.423
.901
-.380
.377
.760
.967
.117
.958
-.233
Use of ULS-SEM for the analysis of multi-block data
All the variables are standardized: S = R. The correlation matrix R is now approximated using
criterion (40), with the aid of the following factorisation formula:
1
0
R (1 ,..., 4 , ) =
0
0
2
0
0
0
0
3
0
0 1 12 13 14 1'
0 21 1 23 24 0
0 31 32 1 34 0
4 41 42 43 1 0
0
'2
0
0
0
0
3'
0
0
0
'4
11'
12 1 '2 13 1 3' 14 1 3'
21 2 1'
23 2 3' 24 2 '4
2 '2
=
31 3 1' 32 3 '2
34 3 '4
3 3'
'
'
'
4 '4
41 4 1 42 4 2 43 4 3
In this way, confirmatory factor analysis (or perhaps rather confirmatory PCA), in the context
described here, allows the best first order reconstruction of the intra- and inter-block correlations.
First analysis
Using AMOS 6.0, we obtained Table 11 and the diagram shown in Figure 10. Confirmatory factor
analysis of the four blocks echoes in essence the results of the first principal components of the
separate PCAs for each block. We have put the significant loadings in bold in Table 11. They well
correspond to the strongest variablePC1 correlations given in Table 10. There are two exceptions:
Smell intensity at rest in block 1 and Astringency in block 4. It should be noted that these variables
have fairly high correlations with the second principal components. The GFI is less than 0.9. This is
due to the existence of the second dimensions for blocks 1, 3 and 4.
Second analysis
In order to better identify the first dimension of the phenomenon under study, it is usual in
confirmatory factor analysis to "purify" the scales: the analysis is repeated, omitting the nonsignificant variables. This is where Table 12 and Figure 11 come from. All the correlations between
the manifest and latent variables of the corresponding block, and the correlations between the latent
variables, are now strongly positive. All the correlations are significant. The first dimension of the
phenomenon under study has therefore been perfectly identified. The GFI of 0.983 is excellent and
well shows the unidimensionality of the selected variables.
39
Table 11: Confirmatory factor analysis of the "Wine" data (AMOS 6.0 output)
(Significant coefficient in bold, non-significant in italic)
Parameter
Estimate
Inf (95%)
Sup (95%)
<---
Rest 1
0.623
-0.768
0.993
0.170
<---
Rest 1
0.944
0.176
1.19
0.018
Fruity note at rest
<---
Rest 1
0.808
0.162
1.128
0.010
Floral note at rest
<---
Rest 1
0.529
-0.416
0.935
0.108
Spicy note at rest
<---
Rest 1
-0.003
-1.064
0.716
...
Visual intensity
<---
View
0.951
0.424
1.319
0.046
Shading
<---
View
0.931
0.486
1.256
0.045
Surface impression
<---
View
1.028
0.266
1.375
0.036
Smell intensity
<---
Shaking 1
0.614
-0.612
1.083
0.183
Smell quality
<---
Shaking 1
0.828
0.347
1.062
0.017
Fruity note
<---
Shaking 1
0.752
0.146
1.042
0.018
Floral note
<---
Shaking 1
0.240
-0.658
0.849
0.353
Spicy note
<---
Shaking 1
0.264
-0.655
0.864
0.253
Vegetable note
<---
Shaking 1
-0.570
-0.999
0.114
0.096
Phenolic note
<---
Shaking 1
0.392
-0.219
0.741
0.200
<---
Shaking 1
0.928
0.176
1.237
0.041
<---
Shaking 1
0.955
0.105
1.299
0.024
<---
Shaking1
0.801
0.175
1.040
0.020
Intensity of attack
<---
Tasting 1
0.897
0.105
1.334
0.016
Acidity
<---
Tasting 1
-0.208
-1.012
0.546
0.595
Astringency
<---
Tasting 1
0.808
-0.276
1.155
0.076
Alcohol
<---
Tasting 1
0.809
0.104
1.251
0.040
Balance
<---
Tasting 1
0.841
0.129
1.168
0.026
Mellowness
<---
Tasting 1
0.893
0.224
1.207
0.021
Bitterness
<---
Tasting 1
0.373
-0.789
0.851
0.326
Ending intensity in mouth
<---
Tasting 1
0.969
0.195
1.336
0.018
Harmony
<---
Tasting 1
0.958
0.295
1.312
0.014
Estimate
Inf (95%)
Sup (95%)
Rest 1
Parameter
<-->
View
0.724
0.069
0.866
0.027
Rest 1
<-->
Shaking 1
0.866
0.725
0.960
0.010
Rest 1
<-->
Tasting 1
0.736
0.578
0.863
0.010
View
<-->
Shaking 1
0.827
0.218
0.950
0.026
View
<-->
Tasting 1
0.887
0.201
0.962
0.020
Shaking 1
<-->
Tasting 1
0.916
0.764
0.968
0.010
GFI
.849
40
.00
.00
e1
1
.00
e20
1
e3
e2
1
.00
.81.94
rest4
.53
.00
View
.87
shaking1
.83
1
shaking2
.61
.83
1
shaking3
e9
.00
1
e8
.00
1
e21
.00
shaking4
shaking6
.89
.80
.96
e24
1
shaking9
e23
1.00
tasting7
.84
.00
e25
.00
e14
.00
1
e15
tasting4
1
.00
1
e18
e26
1
tasting6
tasting5
tasting3
.00
.37
.81
.81
tasting2
1
.00
tasting8
.89
-.21
e19
.97
Tasting 1
.90
tasting1
1
.00
e27
.96
shaking8
.00
1
tasting9
1
.00
.00
.92
shaking10
shaking7
e22
1.00
Shaking 1
.26
-.57
.39
.93
shaking5
.74
.75
.24
e10
.00
1
.93 .95
1.00
1.03
.72
e12
.00
e11
view1
view2
Rest 1
e13
.00
.00
view3
1.00
.00
rest5
rest1
.62
e7
e6
1
rset2
.00
.00
e5
e4
rest3
.00
.00
.00
e16
e17
Figure 10 : Confirmatory factor analysis of the "Wine" data (AMOS 6.0 output)
41
Table 12: Confirmatory factor analysis of the "Wine" data on

the significant variables (p-value < .05) of Table 11 (AMOS 6.0 output)
Parameter
Fruity note at rest
Visual intensity
Shading
Surface impression
Smell quality
Fruity note
Intensity of attack
Alcohol
Balance
Mellowness
Harmony
Parameter
Rest 1
<--> View
Rest 1
<--> Shaking 1
Rest 1
<--> Tasting 1
View
<--> Tasting 1
View
<--> Shaking 1
Shaking 1 <--> Tasting 1
<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<---
Rest 1
Rest 1
View
View
View
Shaking 1
Shaking 1
Shaking 1
Shaking 1
Shaking 1
Tasting 1
Tasting 1
Tasting 1
Tasting 1
Tasting 1
Tasting 1
Estimate
0.645
0.870
0.690
0.837
0.792
0.897
Estimate
0.992
0.885
0.942
0.924
1.039
0.885
0.822
0.917
0.923
0.854
0.891
0.795
0.885
0.930
0.960
0.975
Inf (95%)
0.303
0.694
0.302
0.411
0.329
0.752
.983
GFI
42
Inf (95%) Sup (95%)

0.649
1.197
0.510
1.181
0.539
1.311
0.623
1.243
0.536
1.386
0.569
1.090
0.417
1.076
0.385
1.246
0.363
1.288
0.581
1.065
0.346
1.337
0.247
1.232
0.432
1.173
0.496
1.222
0.438
1.327
0.497
1.323
Sup (95%)
0.806
0.948
0.842
0.948
0.921
0.957
P
0.01
0.01
0.01
0.01
0.01
0.01
P
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.01
0.01
.00
.00
.00
e5
e3
e2
.89
.99
view3
rest2
rest3
e7
e6
.00
.00
view1
view2
.94
1.04 .92
.65
1.00
1.00
View
Rest 1
.87
.00
.79
e12
shaking2
.88
.00
1
shaking3
e11
.69
1.00
.82
Shaking 1
.00
.92
1
shaking8
e22
.92
.84
.85
.00
1
e23
.90
shaking9
.00
e24
1.00
shaking10
.97
Tasting 1
e27
.96
.89
.93
.00
tasting8
e26
.89
.79
tasting1
tasting6
.00
e19
tasting5
1
.00
tasting4
1
.00
.00
e14
e15
e16
Figure 11: Confirmatory factor analysis of the "Wine" data on

the significant variables of Table 9 (AMOS 6.0 output)
43
.00
tasting9
Third analysis
As the four LVs appearing in Figure 11 are highly correlated, it is natural to summarize these LVs
through a second order confirmatory factor analysis. This yields to Figure 12. The regression
coefficient of one MV of each block has been put to 1. The second order LV Score 1 is similar to
the standardized first principal component of the first order LVs as the error variances have been
put to zero. The first order LVs are evaluated by using the McDonald approach. For example,
using the path coefficients shown in Figure 12, we get:
Score(Rest 1) 1 rest2* + .88 rest3*
view1
view2
view3
1.00
1.12 .98
1.00
.88
rest2
rest3
e7
e6
e5
.00
1
.00
d1
Rest 1
.00
.00
e3
e2
.00
.00
.00
.00
View
.83
d2
.84
e12
shaking2
1.00
1.04
.00
.96
shaking3
e11
Score 1
.82
Shaking 1
.00
1.08
1
shaking8
e22
1.09
.00
1.00
.00
d3
e23
.94
shaking9
.00
e24
shaking10
1
.00
1.00
1
Tasting 1
d4
tasting9
.99
.91
.95
tasting8
.00
e26
.91
.82
.00
e27
tasting1
tasting6
.00
e19
tasting5
1
.00
tasting4
1
.00
.00
e14
e15
e16
Figure 12: Second order confirmatory factor analysis of the "Wine" data on
the significant variables of Table 9 (AMOS 6.0 output)
44
In the same way, the second order LV (Score 1) can be computed as a weighted sum of all the
MVs. The regression coefficient of Score 1 in the regression of an MV on Score 1 is equal to
the product of the path coefficients related to the link between this MV and its LV and to the link
between the LV and Score 1. For example
Cov(rest2,Score 1)
= Cov(12 Rest 1 + 12 ,Score 1) = 12Cov(Rest 1,Score 1) = 12
Cov(Rest 1,Score 1)
Var (Score 1)
as the latent variable Score 1 is standardized. This leads to:

n1
Score
.83 1 rest2* + .88 rest3* + " + .94 .91 tasting1* + " + 1 tasting9*
But this formula has a severe drawback: it gives more weight to a block containing many variables
than to a block with few variables. From a pragmatic point of view, we prefer to compute a
weighted sum of the first order standardized LV estimates, using the path coefficients relating the
first order LVs to the second order LV. In fact, these weights are reflecting the quality of the
approximation of the second order LV by the first order LVs. This leads to what is called here
Global score (1):
Global score (1)

.83 Score (Rest 1) + " + .94 Score (Tasting 1)
The correlation table between these scores is given in Table 13. All the computed first order LVs
are well positively correlated and very well summarized by the computed second order LV.
Table 13: Correlation between scores related to the first dimensions of the wine data
Correlations
Rest 1
View 1
Shaking 1
Tasting 1
Global score 1
Rest 1
1
.671
.687
.546
.802
View 1
.671
1
.794
.838
.921
Shaking 1
.687
.794
1
.897
.942
Tasting 1
.546
.838
.897
1
.920
Global
score 1
.802
.921
.942
.920
1
Fourth analysis
To identify the second dimension of the phenomenon under study, we will construct a new
confirmatory PCA model for the manifest variables not taken into account in the second analysis.
The non-significant variables were eliminated iteratively as before. This is where Figure 13 and
Table 14 come from. All the correlations between the manifest and latent variables of the
corresponding block, and the correlations between the latent variables, are now strongly positive.
All the correlations are significant. The second dimension of the phenomenon studied has therefore
been identified. The value of the GFI is 0.919; this means that this second dimension can be
accepted.
45
.00
e4
rest1
.96
1.00
.00
Rest 2
.78
e20
rest5
.00
1
.75
shaking1
e13
.94
.77
1.00
.00
.78
Shaking 2
shaking5
e9
.60
.00
e21
.79
shaking7
1.00
Tasting 2
.85
.91
tasting3
tasting7
1
.00
.00
e25
e17
Figure 13: Confirmatory factor analysis of the "Wine" data

on the variables of Table 8 (AMOS 6.0 output)
Table 14: Confirmatory factor analysis of the "Wine" data on the non-significant variables of Table
9 (AMOS 6.0 output). Results after iterative elimination of non-significant variables.
Estimate
Inf (95%)
Sup (95%)

Spicy note at rest
Parameter
<--- Rest 2
<--- Rest 2
0.957
0.778
0.545
-0.202
1.228
1.196
0.01
0.07
Smell intensity
<--- Shaking 2
0.940
0.475
1.324
0.01
Spicy note
Phenolic note
<--- Shaking 2
<--- Shaking 2
0.781
0.603
0.192
0.054
1.145
1.113
0.01
0.04
Astringency
Bitterness
<--- Tasting 2
<--- Tasting 2
0.912
0.849
0.580
0.042
1.217
1.345
0.01
0.02
Parameter
Estimate
Inf (95%)
Sup (95%)
Shaking 2
Shaking 2
<-->
<-->
Tasting 2
Rest 2
0.791
0.754
0.490
0.487
0.918
0.903
0.01
0.01
Tasting 2
<-->
Rest 2
0.775
0.468
0.903
0.01
GFI
.919
46
Fifth analysis
The three LVs appearing in Figure 13 being highly correlated, they are summarized as above
through a second order confirmatory factor analysis. This yields to Figure 14. Scores related to the
second dimension are computed in the same way as those related to the first dimension. The
correlation table related to these scores is given in Table 15. Comments are the same as for Table
13.
.00
.00
e20
1
e4
1
rest5
rest1
1.00
1.25
.00
1
Rest 2
d1
.70
.00
1
.00
shaking1
e13
1
.00
1.34
Score 2
.54
Shaking 2
shaking5
e9
1.00
d3
1.62
.78
1.00
.00
e21
shaking7
.00
Tasting 2
1.08
tasting3
d4
1.00
tasting7
1
1
.00
.00
e25
e17
Figure 14: Second order confirmatory factor analysis of the "Wine" data on
the variables of Table 12 (AMOS 6.0 output)
Table 15: Correlation between scores related to the second dimensions of the wine data
Correlations
Rest 2
Shaking 2
Tasting 2
Global score 2
Rest 2
1
.758
.776
.908
Shaking 2
.758
1
.793
.933
47
Tasting 2
.776
.793
1
.925
Global
score 2
.908
.933
.925
1
Remarks:
1. The first dimension consists of variables all positively correlated with the global quality grade
(available elsewhere). These correlations are given in Table 16. The second dimension, on the other
hand, is relative to variables not correlated with the global quality grade.
2. It may be wished to obtain orthogonal components in each block. Then, it would be necessary to
use the deflation process, i.e. to construct a new analysis on the residuals of the regression of each
original block Xj on its first computed latent variable LVj.
Table 16: Correlation between the variables related to the two dimensions
and the global quality grade
Variables related to dimension 1
Global quality

Fruity note at rest
0.62
0.50
Visual intensity
0.54
Shading (from orange to purple)

Surface impression
0.51
0.67
Smell quality
0.76
0.61
0.68
0.85
Intensity of attack
Alcohol
Balance (acidity, astringency, alcohol)
Mellowness
Harmony
0.77
0.52
0.95
0.92
0.80
0.88
Global score 1
0.73
Variables related to dimension 2
Global quality

Spicy note at rest
0.04
-0.31
Smell intensity after shaking

Spicy note after shaking
Phelonic note
0.17
-0.08
0.09
Astringency
0.41
Bitterness
0.05
Global score 2
0.08
Graphical displays
Using Global scores (1) and (2), we obtain three graphical displays. The variables are described
with their correlations with Global scores (1) and (2). The individuals are visualized with these two
global scores using appellation and soil markers. These graphical displays are given in Figures 15,
16 and 17. Figures 16 and 17 show clearly that soil is a much better predictor of wine quality than
appellation. All the wines produced on a reference soil are positive on Score 1. The reader
interested in wine can even detect that the two Saumur 1DAM and 2DAM are the best wines from
48
this sample. I can testify that I drank outstanding Saumur-Champigny produced at Dampierre-surLoire.
Figure 15: Graphical display of the variables
Figure 16: Graphical display of the wine with appellation markers
49
Figure 17: Graphical display of the wine with soil markers

IV.
Comparison between the ULS-SEM and PLS approaches.
The die is not cast and the ULS-SEM approach is not uniformly more powerful than the PLS
approach. We have set out the "pluses" and "minuses" of each approach in Table 16.
V.
Conclusion
Roderick McDonald has thrown a bridge between the SEM and PLS approaches by making use of
three ideas: (1) using the ULS method, (2) setting the variances of the residual terms of the
measurement model to 0, and (3) estimating the latent variables by using the loadings of the MVs
on their LVs. The McDonald approach has some very promising implications. Using a SEM
software such as AMOS 6.0 makes it possible to get back to PCA, to the analysis of multi-block data
and to a "data analysis" approach for SEM completely similar to the PLS approach. We have
illustrated this process with three examples, corresponding to these different themes. We have listed
the advantages and disadvantages of the two approaches. We end this paper with a wish: that this
ULS-SEM approach be included in a PLS-SEM software. The user would then have access to a very
comprehensive toolbox for a "data analysis" approach to structural equation modelling.
50
Table 16: Comparison between the ULS-SEM and PLS approaches.

ULS-SEM
The
"pluses"
PLS
Global criterion well identified

Use of SEM softwares
Parameters can be subject to
constraints
Use of bootstrapping on all the
model parameters
Better measurement of the
quality of the theoretical model
Non-recursive model allowed
The
"minuses"
Possible difficulty in model

identification
Possible non-convergence of the
algorithm
Explicit calculation of LVs is
outside the SEM software
Missing data are not permitted
No identification problem
Systematic convergence of the
PLS algorithm
General framework for multiblock data analysis
Robust method for small-size
samples
Possibility of several LVs per
block exists in PLS-Graph
software
Explicit calculation of LVs
integrated in PLS softwares
Easy handling of missing data
Algorithm is often closer to an
heuristic than to the optimisation
of a global criterion
It is impossible to impose
constraints on the parameters
Measurement of the quality of the
inner model is underestimated
Measurement of the quality of the
outer model is overestimated
Non-recursive model prohibited
References
Arbuckle, J.L. (2005): AMOS 6.0. AMOS Development Corporation, Spring House, PA.
Bollen, K. A. (1989): Structural Equations with Latent Variables, John Wiley & Sons.
Chin W.W. (2001): PLS-Graph Users Guide, C.T. Bauer College of Business, University of Houston,
USA.
Hwang, H. & Takane Y. (2004) : Generalized structured component analysis, Psychometrika, 69, 1, 81-99.
McDonald, R.P. (1996): Path analysis with composite variables, Multivariate Behavioral Research,
31 (2), 239-270.
Noonan, R. & Wold, H. (1982): PLS path modeling with indirectly observed variables: a comparison
of alternative estimates for the latent variable. In: Jreskog, K.G., Wold, H. (Eds.), Systems under
Indirect Observation. North-Holland, Amsterdam, pp. 7594.
Pags J., Asselin C., Morlat R., Robichet J. (1987): Analyse factorielle multiple dans le traitement de
donnes sensorielles : Application des vins rouges de la valle de la Loire, Sciences des aliments,
7, 549-571)
Tenenhaus, M. (2007): Statistique : Mthodes pour dcrire, expliquer et prvoir, Dunod, Paris.
Tenenhaus M., Esposito Vinzi V., Chatelin Y.-M., Lauro C. (2005): PLS path modeling.
Computational Statistics & Data Analysis, 48, 159-205.
51
Tenenhaus, M. & Esposito Vinzi, V. (2005): PLS regression, PLS path modeling and generalized
Procustean analysis: a combined approach for multiblock analysis, Journal of Chemometrics, 19,
145-153.
Tenenhaus, M. & Hanafi M. (2007): A bridge between PLS path modelling and multi-block data
analysis , in Handbook of Partial Least Squares (PLS): Concepts, Methods and Applications (V.
Esposito Vinzi, W. Chin, J. Henseler, H. Wang, Eds), Volume II in the series of the Handbooks of
Computational Statistics, Springer, in press.
Tenenhaus, M., Pags, J., Ambroisine, L. & Guinot, C. (2005): PLS methodology to study
relationships between hedonic judgements and product characteristics, Food Quality and Preference,
vol. 16, n 4, pp. 315-325.
XLSTAT (2007): XLSTAT-PLSPM module, XLSTAT software, Addinsoft, Paris.
52

Structural Equation Modelling For Small Samples

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Structural Equation Modelling For Small Samples

Uploaded by

Copyright:

Available Formats

Structural Equation Modelling for small samples

where B is a zero-diagonal m m matrix of regression coefficients, a m k matrix of regression

The column vector y, obtained by concatenation of the yjs, is written as

and consequently (3) becomes

Factorisation of the manifest variable covariance matrix

xy = x ' [ (I - B) '] y '

From which we finally obtain:

yx y [(I - B) 1 (' + )][(I - B) ']1 y ' +

is null, even when it yields to negative variance

i.e. the proportion of S

explained by the model. By convention, the model under study is

acceptable when the GFI is greater than 0.90.

can be deduced from the CMIN criterion given in

where N is the number of cases.

Using the McDonald approach, the GFI given by AMOS is equal to

The "McDonald" approach for LV evaluation

From this, we deduce that formula (16) can also be written as

This approach is standard in customer satisfaction studies.

Table 2: Outputs of AMOS 6.0

1) Confidence intervals are computed by bootstrapping on 200 bootstrap samples.

Tableau 2.2: Variances

Tableau 2.3: Squared Multiple Correlations

Tableau 2.4: Model Fit Summary

Figure 3: Loading plot for the PCA of judges

with the same standardization than for the previous block.

with the same standardization than for the previous blocks.

Tableau 2.6: ULS-SEM latent variable score correlation matrix

Use of PLS Path modeling

Figure 4: XLSTAT-PLSPM software output for the orange juice data

with an R = 0.948 to be compared with R2 = 0.960 in the model shown in Figure 4.

Table 3: XLSTAT-PLSPM outputs for the orange juice example

Variables/Factors correlations (Sensorial):

Variables/Factors correlations (Hedonic):

Table 3.3: Model validation

Critical ratio (CR)

j is the first eigenvalue computed from the PCA of block j

Table 3.4: Latent Variable validation

Sweetening power and Vitamin C are not correlated to their block.

Sweetening power and Vitamin C are not significant in block physico-chemical

Table 3.6: Correlations between MV and LV

Standardized loading = correlation

Table 3.6: Correlations between MV and LV (continued)

Comment: (identical with those for weights)

Table 3.7: Inner model

Critical ratio (CR)

Upper bound (90%)

Path coefficients (Sensorial):

Upper bound (90%)

Critical ratio (CR) Lower bound (90%) Upper bound (90%)

Path coefficients (Hedonic):

Path coefficients (Hedonic):

Table 3.8: Impact and contribution of the variables to Hedonic

Impact and contribution of the variables to Hedonic

relative contribution of each explanatory variable Xj to the R square.

Table 3.9: Model assessment

Table 3.10: Correlation between the latent variables

Table 3.11: Direct, indirect and total effects

Table 3.11: Direct, indirect and total effects (continued)

Hedonic = .306*Physico-chemical + .713*.820*Physico-chemical

Total effects (Latent variable):

Hedonic = .306*Physico-chemical + .713*.820*Physico-chemical

Table 3.12: Discriminant validity

Table 3.13: Latent variable score

Hedonic = .306Physico-chemical + .713.820*Physico-chemical

Hedonic = .306Physico-chemical + .713.820*Physico-chemical

.924(capacity* + power* + speed) + .784(weight + length* + width*)