Professional Documents
Culture Documents
Karl G J oreskog
Alternative Titles Factor Analysis at 100: The Last 50 Years Factor Analysis: 50 Years in 50 minutes
The numbers in the diagonal of Rc are called communalities. 2 Guttman (1956) showed that the squared multiple correlation Ri in the regression of the ithe variable on all the other variables is a lower bound for the communality of the ith variable:
2 c2 i Ri
(2) % $
& '
% $
& '
Slide 2
(3)
(4)
&
Karl G J oreskog
Slide 5
Slide 7
and I investigated a simple non-iterative procedure for estimating and from the sample covariance matrix S. Later I developed a maximum likelihood method for estimating model (5). & ' % $ & '
(9)
This leads to a very fast and ecient algorithm, the use of which has been very successful. % $
Slide 6
Slide 8
(10)
FGLS (, ) =
(11)
Each of these t functions can also be minimized by minimizing the corresponding concentrated t function (8).
Karl G J oreskog
As more knowledge is gained about the nature of social and psychological measurements, however, exploratory factor analysis may not be a useful tool and may even become a hindrance. (12) Slide 11 Most studies are to some extent both exploratory and conrmatory since they involve some variables of known and other variables of unknown composition. The former should be chosen with great care in order that as much information as possible about the latter may be extracted. It is highly desirable that a hypothesis which has been suggested by mainly exploratory procedures should subsequently be conrmed, or disproved, by obtaining new data and subjecting these to more rigorous statistical techniques.
FV (, ) = Slide 9
1 tr[(S )V]2 2
ULS : V = I GLS : V = S ML : V =
1
& '
% $
& ' The basic idea of factor analysis is the following. For a given set of response variables x1 , . . . , xp one wants to nd a set of underlying latent factors 1 , . . . , k , fewer in number than the observed variables. These latent factors are supposed to account for the intercorrelations of the response variables in the sense that when the factors are partialed out from the observed variables, there should no longer remain any correlations between these. If both the observed response variables and the latent factors are measured in deviations from the mean, this leads to the model: xi = i1 1 + i2 2 + + in k + i , (16)
% $
Slide 10
Slide 12
where i , the unique part of xi , is assumed to be uncorrelated with 1 , 2 , . . . , k and with j for j = i. In matrix notation (16) is x = + & (17) %
Karl G J oreskog
Two-Stage Least-Squares
The unique part i consists of two components: a specic factor si and a pure random measurement error ei . These are indistinguishable, unless the measurements xi are designed in such a way that they can be separately identied (panel designs and multitrait-multimethod designs). The term i is often called the measurement error in xi even though it is widely recognized that this term may also contain a specic factor as stated above.
y = x+u, Slide 15
1 = S xx sxy , 1 1 1 = (S zx S S zx S zz Szx ) zz szy , 1 1 uu (S zx S , (n p)1 zz Szx )
Slide 13
& '
% $
& '
% $
Rotation
= + 2 , where and 2 are the covariance matrices of and , respectively. Slide 14 Let T be an arbitrary non-singular matrix of order k k and let = T Then we have identically
(18)
= T
= TT
This shows that at least k2 independent conditions must be imposed on and/or to make these identied. & % &
where 2 (q k ) consists of the last q = p k rows of . The matrix 2 may, but need not, contain a priori specied elements. We say that the model is unrestricted when 2 is entirely unspecied and that the model is restricted when 2 contains a priori specied elements. %
Karl G J oreskog
Multigroup Analysis
x2 = 2 x1 + u , (26) Consider data from several groups or populations. These may be dierent nations, states, or regions, culturally or socioeconomically dierent groups, groups of individuals selected on the basis of some known selection variables, groups receiving dierent treatments, and control groups, etc. In fact, they may be any set of mutually exclusive groups of individuals that are clearly dened. It is assumed that a number of variables have been measured on a number of individuals from each population. This approach is particularly useful in comparing a number of treatment and control groups regardless of whether individuals have been assigned to the groups randomly or not. & ' % $ where u = 2 2 1 . Each equation in (26) is of the form (19) but it is not a regression equation because u is correlated with x1 , since 1 is correlated with x1 . Slide 17 Let xi = i x1 + ui , (27) Slide 19
be the i-th equation in (26), where i is the i-th row of 2 , and let x(i) (q 1 1) be a vector of the remaining variables in x2 . Then ui is uncorrelated with x(i) so that x(i) can be used as instrumental variables for estimating (27). Provided q k + 1, this can be done for each i = 1, 2, . . . , q . & ' % $
Slide 18
Slide 20
Karl G J oreskog
Econometric Models
yt = + Byt + xt + zt
(35)
Slide 21
Slide 23
t = 1, 2, . . . , N
(36)
yti = i + (i) yt(i) + (i) xt(i) + zti , 1 = 2 = = G 1 = 2 = = G (32) (33) = Cov % $ g and Sg be the sample mean vector and covariance matrix Let z in group g , and let g () and g () be the corresponding population mean vector and covariance matrix g = 1, 2, . . . , G. The t function for the multigroup case is dened as Ng Fg () , N g=1
G
(37)
he showed that one can estimate the mean vector and covariance matrix of in each group on a scale common to all groups. & ' & '
y x
A A + AA A
(38)
% $
Slide 22
F() =
(34)
Slide 24
zg , Sg , g ( ), g ()) is any of the t functions where Fg ( ) = F( dened for a single group. Here Ng is the sample size in group g and N = N1 + N2 + . . . + NG is the total sample size. To test the model, one can again use c = (N 1) times the minimum of F as a 2 with degrees of freedom d = Gk (k + 1)/2 t, where k is the number of variables. & %
Karl G J oreskog
Slide 25
This LISREL model was generalized in 1971-72 to include models previously developed for multiple indicators of latent variables , for conrmatory factor analysis , for simultaneous factor analysis in several populations and more general models for covariance structures . The basic form of the LISREL model has remained the same ever since and is still the same model as used today. The general form of the LISREL model, due to its exible specication in terms of xed and free parameters and simple equality constraints, has proven to be so rich that it can handle not only the large variety of problems studied by hundreds of behavioral science researchers but also complex models, such as multiplicative MTMM models, non-linear models, and time series models, far beyond the type of models for which it was originally conceived. & ' The rst version of LISREL made generally available and with a written manual was LISREL III. It had xed column input, xed dimensions, only the maximum likelihood method, and users had to provide starting values for all parameters. The versions that followed demonstrated an enormous development in both statistical methodology and programming technology: % $
x1
k Q + k Q + +
x2
x3
Slide 27
x4
x5
x6
Q1 Q (x)
y1
y2
y3
y4
x7
Slide 26
LISREL IV (1978) had Keywords, Free Form Input, and Dynamic Storage Allocation LISREL V (1981) had Automatic Starting Values, Unweighted and Generalized Least Squares, and Total Eects LISREL VI (1984) had Parameter Plots, Modication Indices, and Automatic Model Modication LISREL 7 (1988) had PRELIS, Weighted Least Squares, and Completely Standardized Solution LISREL 8 (1994) had SIMPLIS, Path Diagrams, and Non-linear Constraints
&
&
Karl G J oreskog
Assumptions
is uncorrelated with Slide 31
= y A( + )A y + x A y + where A = (I B)1 . The elements of and are functions of the elements of , , y , x , y , x , B, , , , , , and which are of three kinds: xed parameters that have been assigned specied values, constrained parameters that are unknown but linear or non-linear functions of one or more other parameters, and free parameters that are unknown and not constrained. y Ax + x x + ,
Slide 29
& '
% $
& '
% $
(40)
Then it follows that the mean vector and covariance matrix of z = (y , x ) are
&
&
Karl G J oreskog
Some Formulas
Let s be a vector of the non-duplicated elements of S and assume that n 2 (s ) N (0, ) Denitions k = number of observed variables 1 k(k + 1) s = 2 t = number of independent parameters < s
1
Slide 35
st
= =
()
s1 t1
(41)
Slide 33
evaluated at
c sd
d = st & ' K
k s
2
% $
s
s1
= K vec(S) = nACov(s)
where the weight matrix V is dened dierently for dierent t functions: ULS: V = I = diag (1, 2, 1, 2, 2, 1, . . .) GLS: V = D (S1 S1 )D ML: V = D (
1
ss
Slide 34
W
ss
= n Est[ACov(s)] )K under NT, = 2K ( = (wgh,ij ) under NNT, = nEst[ACov(sgh , sij )] = mghij sgh sij ,
N
)D if WNNT singular
1 WLS: V = WNNT
or WNNT
1 1 DWLS: V = D W = [diagW]
with
W = WNT W = WNNT
or
= (1/N )
a=1
Karl G J oreskog
E
ss
= V
) = E1 VWVE1 nACov( Slide 37 ) = W E1 nACov(s with c2 h1 c3 c4 W = WNT W = WNNT ) c (c WNT c )1 c (s ) = n(s = tr[(c WNT c )1 (c WNNT c )] = (d/h1 )c2 ) c (c WNNT c ) = n(s
1
or
) c (s % $ & '
& '
% $
Slide 38
There has been an enormous development of structural equation modeling in the last 30 years. Proof: Thousands of journal articles Hundreds of dissertations Numerous books
250
Slide 40
200
150
100
50
Karl G J oreskog
Slide 41
SEM specically expresses the eects of latent variables on each other and the eect of latent variables on observed variables SEM can be used to test alternative hypotheses. SEM gives social and behavioral researchers powerful tools for stating theories more exactly, testing theories more precisely, generating a more thorough understanding of observed data.
(42)
f (x) =
h()
i=1
g (xi | )d
(43)
& '
(45)
(i)
j =1
ij j ) F (s1
ij j )
j =1
(46)
(i)
(47) (48) %
et 1 + et