You are on page 1of 22

Journal of Operations Management 24 (2006) 148169 www.elsevier.

com/locate/dsw

Use of structural equation modeling in operations management research: Looking back and forward
Rachna Shah *, Susan Meyer Goldstein 1
Operations and Management Science Department, Carlson School of Management, 321, 19th Avenue South, University of Minnesota, Minneapolis, MN 55455, USA Received 10 October 2003; received in revised form 28 March 2005; accepted 3 May 2005 Available online 5 July 2005

Abstract This paper reviews applications of structural equation modeling (SEM) in four major Operations Management journals (Management Science, Journal of Operations Management, Decision Sciences, and Journal of Production and Operations Management Society) and provides guidelines for improving the use of SEM in operations management (OM) research. We review 93 articles from the earliest application of SEM in these journals in 1984 through August 2003. We document and assess these published applications and identify methodological issues gleaned from the SEM literature. The implications of overlooking fundamental assumptions of SEM and ignoring serious methodological issues are presented along with guidelines for improving future applications of SEM in OM research. We nd that while SEM is a valuable tool for testing and advancing OM theory, OM researchers need to pay greater attention to these highlighted issues to take full advantage of its potential. # 2005 Elsevier B.V. All rights reserved.
Keywords: Empirical research methods; Structural equation modeling; Operations management

1. Introduction Structural equation modeling as a method for measuring relationships among latent variables has been around since early in the 20th century originating in Sewall Wrights 1916 work (Bollen, 1989). Despite a slow but steady increase in its use, it was not until the monograph by Bagozzi in 1980 that the technique was
Note: List of reviewed articles is available upon request from the authors. * Corresponding author. Tel.: +1 612 624 4432. E-mail addresses: rshah@csom.umn.edu (R. Shah), smeyer@csom.umn.edu (S.M. Goldstein). 1 Tel.: +1 612 626 0271.

brought to the attention of a much wider audience of marketing and consumer behavior researchers. While Operations Management (OM) researchers were slow to use this new statistical approach, structural equation modeling (SEM) has more recently become one of the preferred data analysis methods among empirical OM researchers, and articles that employ SEM as the primary data analytic tool now routinely appear in major OM journals. Despite its regular and frequent application in the OM literature, there are few guidelines for the application of SEM and even fewer standards that researchers adhere to in conducting analyses and presenting and interpreting results, resulting in a large

0272-6963/$ see front matter # 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.jom.2005.05.001

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169

149

variance across articles that use SEM. To the best of our knowledge, there are no reviews of the applications of SEM in the OM literature, while there are regular reviews in other research areas that use this technique. For instance, focused reviews have appeared periodically in psychology (Hershberger, 2003), marketing (Baumgartner and Homburg, 1996), MIS (Chin and Todd, 1995; Gefen et al., 2000), strategic management (Shook et al., 2004), logistics (Garver and Mentzer, 1999), and organizational research (Medsker et al., 1994). These reviews have revealed vast discrepancies and serious aws in the use of SEM. Steiger (2001) notes that even SEM textbooks ignore many important issues, suggesting that researchers may not have sufcient guidance to use SEM appropriately. Due to the complexities involved in using SEM and problems uncovered in its use in other elds, a review specic to OM literature seems timely and warranted. Our objectives in conducting this review are threefold. First, we characterize published OM research in terms of relevant criteria such as software used, sample size, parameters estimated, purpose for using SEM (e.g. measurement model development, structural model evaluation), and t measures used. In using SEM, researchers have to make subjective choices on complex elements that are highly interdependent in order to align research objectives with analytical requirements. Therefore, our second objective is to highlight these interdependencies, identify problem areas, and discuss their implications. Third, we provide guidelines to improve analysis and reporting of SEM applications. Our goal is to promote improved usage of SEM, standardize terminology, and help prevent some common pitfalls in future OM research.

2. Overview of structural equation modeling To provide a basis for subsequent discussion, we present a brief overview of structural equation modeling along with two special cases frequently used in the OM literature. The overview is intended to be a brief synopsis rather than a comprehensive detailing of mathematical model specication. There are a number of books (Maruyama, 1998; Bollen, 1989) and articles dealing with mathematical speci-

cation (Anderson and Gerbing, 1988), key assumptions underlying model specication (Bagozzi and Yi, 1988; Fornell, 1983), and other methodological issues of evaluation and t (MacCallum, 1986; MacCallum et al., 1992). At the outset, we point to a distinction in the use of two terms that are often used interchangeably in OM: covariance structure modeling (CSM) and structural equation modeling (SEM). CSM represents a general class of models that include ARMA (autoregressive and moving average) time series models, multiplicative models for multi-faceted data, circumplex models, as well as all SEM models (Long, 1983). Thus, SEM models are a subset of CSM models. We restrict the current review to SEM models because other types of CSM models are rarely used in OM research. Structural equation modeling is a technique to specify, estimate, and evaluate models of linear relationships among a set of observed variables in terms of a generally smaller number of unobserved variables (see Appendix A for detail). SEM models consist of observed variables (also called manifest or measured, MV for short) and unobserved variables (also called underlying or latent, LV for short) that can be independent (exogenous) or dependent (endogenous) in nature. LVs are hypothetical constructs that cannot be directly measured, and in SEM are typically represented by multiple MVs that serve as indicators of the underlying constructs. The SEM model is an a priori hypothesis about a pattern of linear relationships among a set of observed and unobserved variables. The objective in using SEM is to determine whether the a priori model is valid, rather than to nd a suitable model (Gefen et al., 2000). Path analysis and conrmatory factor analysis are two special cases of SEM that are regularly used in OM. Path analysis (PA) models specify patterns of directional and non-directional relationships among MVs. The only LVs in such models are error terms (Hair et al., 1998). Thus, PA provides for the testing of structural relationships among MVs when the MVs are of primary interest or when multiple indicators for LVs are not available. Conrmatory factor analysis (CFA) requires that LVs and their associated MVs be specied before analyzing the data. This is accomplished by restricting the MVs to load on specic LVs and by designating which LVs are allowed to correlate.

150

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169

Fig. 1. Illustrations of PA, CFA, and SEM models.

A CFA model allows for directional inuences between LVs and their MVs and (only) nondirectional (correlational) relationships between LVs. Long (1983) provides a detailed (mathematical) treatment of each of these techniques. Fig. 1 shows graphical illustrations of SEM, PA and CFA models. Throughout this paper, we use the term SEM to refer to all three model types (SEM, PA, CFA) and note any exceptions to this.

3. Review of published SEM research Our review focuses on empirical applications of SEM which include: (1) CFA models alone, such as in measurement or validation research; (2) PA models (provided they are estimated using software which allows latent variable modeling); and (3) SEM models that combine both measurement and structural components. We exclude theoretical papers,

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169

151

papers using simulation, conventional exploratory factor analysis (EFA), structural models estimated by regression models (e.g. models estimated by two stage least squares), and partial least squares (PLS) models. EFA models are not included because the measurement model is not specied a priori (MVs are not restricted to load on a specic LV and a MV can load on multiple LVs),1 whereas in SEM, the model is explicitly dened a priori. The main objective of regression and PLS models is prediction of variance explanation in the dependent variable(s) compared to theory development and testing in the form of structural relationships (i.e. parameter estimation) in SEM. This philosophical distinction between these approaches is critical in deciding whether to use PLS or SEM (Anderson and Gerbing, 1988). In addition, because assumptions underlying PLS and regression are less constraining than SEM, the problems and concerns in conducting these analyses are signicantly different. Therefore, we do not include regression and PLS models in our review. 3.1. Journal selection We considered all OM journals that are recognized as publishing high quality and relevant empirical OM research. Recently, Barman et al. (2001) ranked Management Science (MS), Operations Research (OR), Journal of Operations Management (JOM), Decision Sciences (DS), and Journal of Production and Operations Management Society (POMS) as the top OM journals in terms of quality. In the past decade, several additional reviews have examined the quality and/or relevance of OM journals and have consistently ranked these journals in the top tier (Vokurka, 1996; Goh et al., 1997; Soteriou et al., 1998; Malhotra and Grover, 1998). We do not include OR in our review as its mission does not include publishing empirical research. We selected MS, JOM, DS, and POMS as the journals most representative of high quality and relevant empirical research in OM. In our review, we include articles from these four journals that meet our methodology criteria and do not exclude articles due to topic of research.
1 Target rotation, rarely used in OM research, is an instance of EFA in which the model is specied a priori.

3.2. Time horizon and article selection Rather than use specic search terms for selecting articles, we manually checked each article of the reviewed journals. Although more time consuming, the manual search gave us more control and better coverage than a keyword based search because there is no widely accepted terminology for research methods in OM to conduct such a search. In selecting an appropriate time horizon, we started with the most recent issue of each journal available until August 2003 and moved backwards in time. Using this approach, we reviewed all published issues of JOM from 1982 (Volume 1, Number 1) to 2003 (Volume 21, Number 4) and POM from 1992 (Volume 1, Number 1) to 2003 (Volume 12, Number 1). For MS and DS, we moved backward in time until we no longer found applications of SEM. The earliest application of SEM in DS was found in 1984 (Volume 15, Number 2) and the most recent issue reviewed is Volume 34, Number 1 (2003). The incidence of SEM in MS began in 1987 (Volume 34, Number 6) and we reviewed all issues through Volume 49, Number 8 (2003). The earliest publication in these two journals corresponds with our knowledge of the eld and seems to have face validity as such because it coincides with the general timeframe when SEM was beginning to gain attention of the wider audience in other literature streams. In total, we found 93 research articles that satised our selection criteria. Fig. 2 shows the number of articles stacked by journal for the years we reviewed. This gure is very informative: overall, it is clear that the number of SEM articles has increased signicantly over the past 20 years in the four journals individually

Fig. 2. Number of articles by journal and year.

152

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169

and cumulatively. To assess the growth trend in the use of SEM, we regress the number of articles on an index of year of publication (beginning with 1984). We use both linear and quadratic effects of time in the regression model. The regression model is signicant (F 2,17 = 39.93, p = 0.000) and indicates that 82% of the variance in the number of SEM publications is explained by the linear and quadratic effects of time. Further, the linear trend is not signicant (t = 0.850, p = 0.41), whereas the quadratic effect is signicant (t = 2.94, p = .009). So the use of SEM has not grown linearly as a function of time, rather it has accelerated over time. In contrast, the use of SEM in marketing and psychology grew steadily over time and there is no indication of its accelerated use in more recent years (Baumgartner and Homburg, 1996; Hershberger, 2003). There are several software programs available for conducting SEM analysis, and each has idiosyncrasies and fundamental requirements for conducting analysis. In our database, 19.6% of the articles did not report the software used. Of the articles that reported the software, LISREL accounted for 48.3% followed by EQS (18.9%), SAS (9.1%), AMOS (2.8%), RAMONA (0.7%) and SPSS (0.7%). LISREL was the rst software developed to solve structural equation models and seems to have capitalized on its rst mover advantage not only in psychology (MacCallum and Austin, 2000) and marketing (Baumgartner and Homburg, 1996) but also in OM. 3.3. Unit of analysis In our review we found that multiple models were sometimes presented in one article. Therefore, the unit of analysis from this point forward (unless specied otherwise) is the actual applications (one or more models for each article). A single model is included in our data set in the following situations: (1) when a single model is proposed and evaluated using a single sample; (2) when multiple alternative or nested models are evaluated using a single sample, only the nal model is included in our analysis; (3) when a single model is evaluated with either multiple samples or by splitting a sample, only the model tested with the verication sample is included in our analysis. Thus, in these three cases, each article contributed only one

model to the analysis. When more than one model is evaluated (using single, multiple, or split samples) each distinct model is included in our analysis. In this situation, each article contributed more than one model to the analysis. A total of 143 models were drawn from the 93 research articles, thus the overall sample size for the remainder of the paper is 143. Of the 143 models, we could not determine the method used for four models. Of the remaining 139 models, 26 are PAs, 38 are CFAs, and 75 are SEMs. There are a small number of articles that reported models that never achieved adequate t (by the authors descriptions), and while we include these articles in our review, the t measures are omitted from our analysis to avoid inclusion of data related to models with inadequate t.

4. Critical issues in the application of SEM There are many important issues to consider when using SEM, whether for evaluating a measurement model or examining the t of structural relationships, separately or simultaneously. Our discussion of issues is organized into three groups: (1) issues to consider or address prior to analysis are categorized under the pre-analysis stage; (2) issues and concerns to address during analysis; and (3) issues related to the post-analysis stage, which includes issues related to evaluation, interpretation and presentation of results. Decisions made at each stage are highly interdependent and signicantly impact the quality of results, and we cross-reference and discuss these interdependencies whenever possible. 4.1. Issues related to pre-analysis stage Issues related to the pre-analysis stage need to be considered prior to conducting SEM analysis and include conceptual issues, sample size issues, measurement model specication, latent model specication, and degrees of freedom issues. A summary of pre-analysis data from the reviewed OM studies is presented in Table 1. 4.1.1. Conceptual issues An underlying assumption of SEM analysis is that the items or indicators used to measure a LV are

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169 Table 1 Issues related to pre-analysis stage Path analysis models Number of models revieweda Sample size Median Mean Range Number of parameters estimated Median Mean Range Sample size/parameters estimated Median Mean Range Number of manifest variables Median Mean Range Number of latent variables Median Mean Range Manifest variables/latent variable Median Mean Range Number of single indicator latent variables b Correlated measurement errors (CMEs) Theoretical justication for CMEs Recursiveness Evidence of model identication Degrees of freedom (d.f.) Median Mean Range Proportion reporting
a b c

153

Conrmatory factor analysis models 38 141.0 245.4 (63, 902) 31.0 38.3 (8, 98) 6.2 8.8 (2.3, 36.1) 12.5 13.5 (4, 32) 3.0 3.66 (1, 10) 4.0 5.2 (1.3, 16.0) Reported for 1 model 11 models (28.9%) 0 (0% of CFA models with CMEs)

Structural equation models 75 202.0 246.4 (52, 840) 34.0 37.5 (11, 101) 5.6 7.4 (1.6, 25.4) 12.0 16.3 (5, 80) 4.0 4.7 (1, 12) 3.3 4.1 (1.3, 9.0) Reported for 25 models 8 models (10.7%), 4 models unknownc 4 (50% of SEM models with CMEs)

All modelsa

26 125.0 251.2 (18, 2338) 10.0 11.3 (2, 34) 9.6 33.5 (2.9, 389.7) 6.0 6.3 (3, 10) Not relevant Not relevant Not relevant Not relevant Not relevant Not relevant Not relevant 1 model unknownc Not relevant

143 176.0 243.3 (16, 2338) 26.0 31.9 (2, 101) 6.4 13.2 (1.6, 389.7) 11.0 14.0 (3, 80) 4.0 4.4 (1, 12) 3.6 4.5 (1.3, 16.0) Reported for 28 models 19 models (13.3%), 6 models unknownc 4 (21% of all models with CMEs)

127 (88.8%) recursive; 13 (9.1%) nonrecursive; not reported or could not be determined from model description for 3 (2.1%) models Reported by 3.8% Reported by 26.3% Reported by 5.3% Reported by 10.5% 4.5 4.6 (1, 11) 53.8% 62.0 90.1 (5, 367) 52.6% 52.5 124.5 (4, 690) 88.0% 48.0 99.7 (1, 690) 71.3%

The type of analysis performed could not be determined for 4 of 143 models published in 93 articles. The number of latent variables modeled using a single measured variable (i.e. single indicator). Presence of CMEs could not be determined due to inadequate model description.

154

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169

reective (i.e. caused by the same underlying LV) in nature. Yet researchers frequently apply SEM to formative indicators. Formative (also called causal) indicators are measures that form or cause the creation of a LV (MacCallum and Browne, 1993; Bollen, 1989). An example of formative measures is the amount of beer, wine and hard liquor consumed to indicate level of mental inebriation (Chin, 1998). It can be hardly argued that mental inebriation causes the amount of beer, wine and hard liquor consumption. On the contrary, the amount of each type of alcoholic beverage affects the level of mental inebriation. Formative indicators do not need to be highly correlated or have high internal consistency (Bollen, 1989). In this example, an increase in beer consumption does not imply an increase in wine or hard liquor consumption. Measurement of formative indicators requires an index (as opposed to developing a scale when using reective indicators), and can be modeled using SEM, but requires additional constraints (Bollen, 1989; MacCallum and Browne, 1993). Using SEM without additional constraints makes the resulting estimates invalid (Fornell et al., 1991) and the model statistically unidentied (Bollen and Lennox, 1991). Another underlying assumption for SEM is that the theoretical relationships hypothesized in the models being tested represent actual relationships in the studied population. SEM assesses how closely the observed data correspond to the expected patterns and requires that relationships represented by the model are well established and amenable to accurate measurement in the population. SEM is not recommended for exploratory research when the measurement structure is not well dened or when the theory that underlies patterns of relationships among LVs is not well established (Brannick, 1995; Hurley et al., 1997). Thus, researchers need to carefully consider: (1) type of items, (2) state of underlying theory, and (3) stage of development of measurement instrument, prior to using SEM. For formative measurement items, researchers should consider alternative techniques such as SEM using formative indicators (MacCallum and Browne, 1993) and components-based approaches such as partial least squares (Cohen et al., 1990). When the underlying theory or the measurement structure is not well developed, simpler data analytic

techniques such as EFA and regression analysis may be more appropriate (Hurley et al., 1997). 4.1.2. Sample size issues Adequacy of sample size has a signicant impact on the reliability of parameter estimates, model t, and statistical power. Using a simulation experiment to examine the effect of varying sample size to parameter estimate ratios, Jackson (2003) reports that smaller sample sizes are generally characterized by parameter estimates with low reliability, greater bias in x2 and RMSEA t statistics, and greater uncertainty in future replication. How large a sample should be for SEM is deceptively difcult to determine because it is dependent upon several characteristics such as number of MVs per LV (MacCallum et al., 1996), degree of multivariate normality (West et al., 1995), and estimation method (Tanaka, 1987). Suggested approaches for determining sample size include establishing a minimum (e.g., 200), having a certain number of observations per MV, having a certain number of observations per parameters estimated (Bentler and Chou, 1987; Bollen, 1989; Marsh et al., 1988), and through conducting power analysis (MacCallum et al., 1996). While the rst two approaches are simply rules of thumbs, the latter two have been studied extensively. Table 1 reports the results of analysis of SEM applications in the OM literature related to sample size and number of parameters estimated. The smallest sample sizes for PA (n = 18), CFA (n = 63), and SEM (n = 52) are signicantly smaller than established guidelines for models with even minimal complexity (MacCallum et al., 1996; Marsh et al., 1988). Additionally, 67.9% of all models have ratios of sample size to parameters estimates of less than 10:1 and 35.7% of models have ratios of less than 5:1. The lower end of both sample size and sample size to parameter estimate ratios are signicantly smaller in the reviewed OM research than those studied by Jackson (2003), indicating that the OM literature may be highly susceptible to the negative outcomes reported in his study. Statistical power (i.e. the ability to detect and reject a poor model) is critical to SEM analysis because, in contrast to traditional hypothesis testing, the goal in SEM analysis is to produce a non-

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169

155

signicant result between sample data and the implied covariance matrix derived from model parameter estimates. Yet, a non-signicant result may also be due to a lack of ability (i.e. power) to detect model misspecication. Few studies in our review mentioned power and none estimated power explicitly. Therefore, we employed MacCallum et al. (1996), who dene minimum sample size as a function of degrees of freedom that is needed for adequate power (0.80) to detect close model t, to assess the power of models in our sample. (We were not able to assess power for 41 of 143 models due to insufcient information.) Our analysis indicates that 37% of the models have adequate power and 63% do not. These proportions are consistent with similar analyses in psychology (MacCallum and Austin, 2000), MIS (Chin and Todd, 1995), and strategy (Shook et al., 2004), and have not changed since 1960 (Sedlmeier and Gigenrenzer, 1989). We recommend that future researchers use MacCallum et al. (1996) to calculate the minimum sample size needed to ensure adequate statistical power. 4.1.3. Degrees of freedom and model identication Degrees of freedom are calculated as follows: d:f : 1=2f p p 1g q, where p is the number of MVs, 1=2f p p 1g is the number of equations (or alternately, the number of distinct elements in the input matrix S), and q is the effective number of free (unknown) parameters to be estimated minus the number of implied variances. As the formula indicates, degrees of freedom is a function of model specication in terms of the number of equations and the effective number of free parameters that need to be estimated. When the effective number of free parameters is exactly equal to the number of equations (that is, the degrees of freedom are zero), the model is said to be just-identied or saturated. Just-identied models provide an exact solution for parameters (i.e. point estimates with no condence intervals). When the effective number of free parameters is greater than the number of equations (degrees of freedom are less than zero), the model is under-identied and sufcient information is not available to uniquely estimate the parameters. Under-identied models may not converge during model estimation, and when they do, the

parameter estimates they provide are not reliable and overall t statistics cannot be interpreted (Rigdon, 1995). For models in which there are fewer unknowns than equations (degrees of freedom are one or greater), the model is over-identied. An over-identied model is highly desirable because more than one equation is used to estimate at least some of the parameters, signicantly enhancing reliability of the estimate (Bollen, 1989). Model identication is a complex issue and while non-negative degrees of freedom is a necessary condition, additional conditions such as establishing a scale for each LV are frequently required (for a detailed discourse on sufciency conditions, see Long, 1983; Bollen, 1989). In our review, degrees of freedom were not reported for 41 (28.7%) models (see Table 1). We recalculated the degrees of freedom independently for each reviewed model to assess discrepancies between the reported and our calculated degrees of freedom. We were not able to reproduce the degrees of freedom for 18 applications based on authors descriptions of their models. This lack of reproducibility may be due in part to poor model description or to correlated errors in the measurement or latent variable models that are not stated in the text. We also examined whether the issue of identication was explicitly addressed for each model. One author reported that the estimated model was not identied and only 10.5% mentioned anything about model identication. Perhaps the issue of identication was considered implicitly because many software programs provide a warning message if a model is not identied. Model identication has a signicant impact on parameter estimates: in an unidentied model, more than one set of parameter estimates could generate the observed data and a researcher has no way to choose among the various solutions because each is equally valid (or invalid, if you wish). Degrees of freedom are critically linked to the minimum sample size required for adequate model t; the greater the degrees of freedom, the smaller the sample size needed for a given level of model t (MacCallum et al., 1996). Calculating and reporting the degrees of freedom are fundamental to understanding the specied model, its identication, and its t. Thus, we recommend that degrees of freedom and model identication should be reported for every tested model.

156

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169

4.1.4. Measurement model specication 4.1.4.1. Number of items (MVs) per LV. It is generally accepted that multiple MVs should measure each LV but the number of MVs that should be used is less clear. A ratio of fewer than three MVs per LV is of concern because the model is statistically unidentied in the absence of additional constraints (Long, 1983). A large number of MVs per LV is advantageous as it helps to compensate for a small sample (Marsh et al., 1988) but disadvantageous as it means more parameters to estimate, requiring a larger sample size for adequate power. A large number of MVs per LV also makes it difcult to parsimoniously represent the measurement structure constituting the set of MVs (Anderson and Gerbing, 1984). In cases where a large number of MVs are needed to represent a LV, Bagozzi and Heatherton (1994) suggest four methods to reduce the number of MVs per LV. In our review, 24% of CFA models (9 of 38) and 39% of SEM models (29 of 75) had a MV:LV ratio of less than 3. Generally, these applications did not explicitly discuss identication issues or additional constraints. The number of MVs per LV characteristic is not applicable to PA models. 4.1.4.2. Single indicator constructs. We identied LVs represented by a single indicator in 2.6% of CFA models and 33.3% of SEM models in our sample (not applicable to PA models). The low occurrence of single indicator variables for CFA is not surprising because the central objective of CFA is construct measurement. However, the relatively high occurrence of single indicator constructs in SEM models is troublesome because single indicators ignore measurement reliability, one of the challenges SEM is designed to circumvent (Bentler and Chou, 1987). The single indicator issue is also tied to model identication as discussed above. Single indicators are only sufcient when one measure perfectly represents a concept, a rare situation, or when measurement reliability is not an issue. Generally, single MVs should be modeled as MVs rather than LVs. 4.1.4.3. Correlated measurement errors. Measurement errors should sometimes be modeled as correlated, for instance, in a longitudinal study when the same item is measured at two points in time (Bollen, 1989 p. 232). The statistical effect of correlated error terms is the same as double loading,

but the substantive meaning is signicantly different. Double loading implies that each MV is affected by two underlying LVs. Fundamental to LV unidimensionality is that each MV load on one LV with loadings on all other LVs restricted to zero. Because adding correlated measurement errors to SEM models nearly always improves model t, they are often used post hoc without improving the substantive interpretation of the model (Fornell, 1983; Gerbing and Anderson, 1984) and making reliability estimates ambiguous (Bollen, 1989 p. 222). To the best of our knowledge, our sample contains no instances of double loading MVs but we found a number of models with correlated measurement errors: 3.8% of PA, 28.9% of CFA, and 10.7% of SEM models. We read the text of each article carefully to determine whether the authors provided any theoretical justication for using correlated errors or whether they were introduced simply to improve model t. In more than half of the applications, no justication was provided. Correlated measurement errors should be used only when warranted on theoretical or methodological grounds (Fornell, 1983) and their statistical and substantive impact should be explicitly discussed. 4.1.5. Latent model specication 4.1.5.1. Recursive/non-recursive models. Models are non-recursive when they contain reciprocal causation, feedback loops, or correlated error terms (Bollen, 1989, p. 83). In such models, the matrix representing latent exogenous variables (B; see Appendix A for more detail) has non-zero elements both above and below the diagonal. If B is lower triangular and the errors in equations are uncorrelated, then the model is called recursive (Hair et al., 1998). Non-recursive models require additional restrictions for the model to be identied, for the stability of estimated reciprocal effects, and for the interpretation of measures of variation accounted for in the endogenous variables (for a more detailed treatment of non-recursive models, see Long, 1983; Teel et al., 1986). In our review, we examined each application for recursive and nonrecursive models due to either simultaneous effects or correlated errors in equations. While we did not observe any instances of simultaneous effects, we found that in 9.1% of the models, either the authors dened their model as non-recursive or a careful reading of the article

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169

157

led to such a conclusion. However, even when authors explicitly stated that they were testing a non-recursive model, we saw little if any explanation of issues such as model identication in the text. We recommend that if non-recursive models are specied, additional restrictions and implications for model identication are explicitly stated in the paper. 4.2. Issues related to data analysis Data analysis issues comprise examining sample data for distributional characteristics and generating an input matrix. Distributional characteristics of the data impact researchers choices of estimation method, and the type of input matrix impacts the selection of software used for analysis. 4.2.1. Data screening Data screening is critical to prepare data for SEM analysis (Hair et al., 1998). Screening through exploratory data analysis includes investigating for missing data, inuential outliers, and distributional characteristics. Signicant missing data result in convergence failures, biased parameter estimates, and inated t indices (Brown, 1994; Muthen et al., 1987). Inuential outliers are linked to normality and skewness issues with MVs. Assessing data normality (along with skewness and kurtosis) is important because many model estimation methods are based on an assumption of normality. Non-normal data may result in inated goodness of t statistics and underestimated standard errors (MacCallum et al., 1992), although these effects are lessened with larger sample sizes (Lei and Lomax, 2005). In our review, only a handful of applications discussed missing data. In the psychology literature, listwise deletion, pairwise deletion, data imputation and full information maximum likelihood (FIML) methods are commonly used to manage missing data (Marsh, 1998). Results from Monte Carlo simulation examining the performance of these four methods indicate the superiority of FIML, leading to the lowest rate of convergence failures, least bias in parameter estimates, and lowest ination in goodness of t statistics (Enders and Bandalos, 2001; Brown, 1994). FIML method is currently available in LISREL (version 8.50 and above), SYSTAT (RAMONA) and AMOS.

We found that for 26.6% of applications, normality was discussed qualitatively in the text of the reviewed articles. Estimation methods such as maximum likelihood ratio and generalized least square assume normality, although some non-normality can be accommodated (Hu and Bentler, 1998; Lei and Lomax, 2005). Weighted least square, ordinary least square, and asymptotically distribution free estimation methods do not require normality. Additionally, ML, Robust in EQS software adjusts model t and parameter estimates for non-normality. Finally, researchers can transform non-normal data, although serious problems have been noted with data transformation (cf. Satorra, 2001). We suggest that some discussion of data screening methods be included generally, and normality be discussed specically in relation to the choice of estimation method. 4.2.2. Type of input matrix While raw data can be used as input for SEM analysis, a covariance (S) or correlation (R) matrix is generally used. In our review of the OM literature, no papers report using raw data, 30.8% report using S, and 25.2% report using R (44.1% of applications did not report the type of matrix used to conduct analysis). Seven of 44 applications using S and 25 of 36 applications using R provide the input matrix in the paper. Not providing the input matrix makes it impossible to replicate the results reported by the author(s). While conventional estimation methods in SEM are based on statistical distribution theory that is appropriate for S but not for R, there are interpretational advantages to using R: if MVs are standardized and the model is t to R, then parameter estimates can be interpreted in terms of standardized variables. However, it is not correct to t a model to R while treating R as if it were a covariance matrix. Cudeck (1989) conducted exhaustive analysis on the implications of treating R as if it were S and concludes that the consequences depend on the properties of the model being tted: standard errors, condence intervals and test statistics for the parameter estimates are incorrect in all cases. In some cases, parameter estimates and values of t indices are also incorrect. Software programs commonly used to conduct SEM deal with this issue in different ways. Correct estimation of a correlation matrix can be done in

158

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169

reskog and So rbom, 1996) but requires the LISREL (Jo user to introduce specic parameter constraints. Although not widely used in OM, RAMONA (Browne and Mels, 1998), EQS (Bentler, 1989) and SEPATH (Steiger, 1999) automatically provide correct estimation with a correlation matrix. Currently, AMOS cannot analyze correlation matrices. In our review, we found 24 instances where authors reported using a correlation matrix with LISREL (out of 69 models run with LISREL) but most did not mention the necessary additional constraints. We found one instance of using AMOS with a correlation matrix. Given the lack of awareness among users about the treatment of R versus S by various software programs, we direct readers attention to a test devised by MacCallum and Austin (2000) to help users determine whether a particular SEM program provides correct estimation of a model t to a correlation matrix. Otherwise, it is preferable to t models to covariance matrices, thus insuring correct results. 4.2.3. Estimation methods A variety of estimation methods such as maximum likelihood ratio (ML), generalized least square (GLS), weighted and unweighted least square (WLS and ULS), asymptotically distribution free (ADF), and ordinary least square (OLS) are available. Their use depends upon the distributional properties of the MVs, and each has computational advantages and disadvantages relative to the others. For instance, ML assumes data are univariate and multivariate normal and requires that the input data matrix be positive denite, but it is
Table 2a Issues related to data analysis for structural model Number of models reporting (n = 143) x2 x2, p-value GFI AGFI RMR (or RMSR) RMSEA NFI NNFI (or TLI) CFI IFI (or BL89) Normed x2 (x2/d.f.) reported Normed x2 calculated
a b

relatively unbiased under moderate violations of normality (Bollen, 1989). GLS assumes normality but does not impose the restriction of a positive denite input matrix. ADF has few distributional assumptions but requires very large sample sizes for accurate estimates. OLS, the simplest method, has no distributional assumptions and is computationally the most robust, but it is scale invariant and does not provide t indices or standard errors for estimates. Forty-eight percent of applications in our review did not report the estimation method used. Of the applications that reported the estimation method, a majority (68.9%) used ML. Estimation method, data normality, sample size, and model specication are inextricably linked and must be considered simultaneously by the researcher. We suggest that authors explicitly state the estimation method used and link it to the properties of the observed variables. 4.3. Issues related to post-analysis Post-analysis issues include evaluating the solution achieved from model estimation, model t, and respecication of the model. Reports of these data from the studied sample are summarized in Tables 2a and 2b. 4.3.1. Evaluation of solution We have organized our discussion of evaluation of solutions into overall model t, measurement model t, and structural model t. To focus solely on the overall t of the model while overlooking important

Proportion reporting (%) 74.8 53.1 58.7 41.3 35.7 35.7 34.3 43.4 51.0 11.2 36.4 68.5

Results: mean; median 204.0; 64.2 0.21; 0.13 0.93; 0.94 0.89; 0.90 0.052; 0.050 0.058; 0.060 0.91; 0.92 0.95; 0.95 0.96; 0.96 0.94; 0.95 1.82; 1.59 2.17; 1.62

Range (0.0, 1270.0) (0.0, 0.94) (0.75, 0.99) (0.63, 0.97) (0.01, 0.14)a (0.00, 0.13) (0.72, 0.99) (0.73, 1.07) (0.88, 1.00) (0.88, 0.98) (0.02, 4.80) (0.01, 21.71)

107 76 84 59 51 51 49 62 73 16 52 98b

One model reported RMR = 145.4; this data point omitted as an outlier relative to other reported RMRs. Data not available to calculate others.

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169 Table 2b Issues related to data analysis for measurement model Number of models reporting (n = 143) Reliability assessment Unidimensionality assessment Discriminant validity addressed Validity issues addressed (R2; variance explained) Path coefcients (condence intervals) Path t-statistics (standard errors) Residual information/analysis provided Specication search conducted for model respecication Modication indices used for model respecication Alternative models compared Inconsistency between described and tested models Cross-validation sample used Split sample approach used 123 94 99 76 138 (3) 90 (21) 19 20 21 29 31 22 27 Proportion reporting (%) 86.0 65.7 69.2 53.1 96.5 (2.1) 62.9 (14.7) 13.3 14.0 14.7 20.3 21.7 15.4 18.9

159

information about parameters is a common error that we encountered in our review. A model with good overall t but yielding nonsensical parameter estimates is not a useful model. 4.3.1.1. Overall model t. Assessing a models t is one of the more complicated aspects of SEM because, unlike traditional statistical methods, it relies on nonsignicance. Historically, the most popular index used to assess the overall goodness of t has been the x2statistic, although its conclusions regarding model signicance are generally ignored. The x2-statistic is inherently biased when the sample size is large but is dependent on distributional assumptions associated with large samples. Additionally, a x2-test offers a dichotomous decision strategy (accept /reject) for assessing the adequacy of t implied by a statistical decision rule (Bollen, 1989). In light of these issues, numerous alternative t indices have been developed to quantify the degree of t along a continuum (see reskog, 1993; Tanaka, 1993; Bollen, 1989, pp. 256 Jo 289; Mulaik et al., 1989 for comprehensive reviews). Fit indices are commonly distinguished as either absolute or incremental (Bollen, 1989). In general, absolute t indices indicate the degree to which the hypothesized model reproduces the sample data, and

incremental t indices measure the proportional improvement in t when the hypothesized model is compared with a restricted, nested baseline model (Hu and Bentler, 1998). Absolute measures of t: The most basic measure of absolute t is the x2-statistic. Other commonly used measures include root mean square error of approximation (RMSEA), root mean square residual (RMR or SRMR), goodness-of-t index (GFI) and adjusted goodness of t (AGFI). GFI and AGFI increase as goodness of t increases and are bounded above by 1.00, while RMSEA and RMR decrease as goodness of t increases and are bounded below by zero (Browne and Cudeck, 1989). Ninety-four percent of the applications we reviewed report at least one of these measures (Table 2a). Although the frequency of use and the magnitude of each of these measures are similar to those reported in marketing by Baumgartner and Homburg (1996), the ranges in our sample are much wider indicating greater variability in empirical OM research. The variability may be an indication of more complex models and/or a less established theory base. Incremental t measures: Incremental t measures compare the model under study to two reference models: (1) a worst case or null model, and (2) an ideal model that perfectly represents the modeled phenomena in the studied population. While there are many incremental t indices, some of the most popular are normed t index (NFI), non-normed t index (NNFI or TLI), comparative t index (CFI) and incremental t index (IFI or BL89). Sixty-nine percent of the reviewed studies report at least one of the four measures (Table 2a). An additional t index that is frequently used is the normed x2 which is reported for 36.4% of models. Because the x2-statistic by itself is beset with problems, the ratio of x2 to degrees of freedom (x2/d.f.) is informative because it corrects for model size. Additionally, we calculated the normed x2 for all models that reported x2 and either reported degrees of freedom or enough model specication information to allow us to ascertain the degrees of freedom (68.5% of all applications) and found a median of 1.62 (range 0.01, 21.71). Small values of normed x2 (<1.0) can indicate an over-tted model and higher values (>3.05.0) can indicate an under reskog, 1969). parameterized model (Jo A brief summary of the effects on t indices of small samples, normality violations, model misspe-

160

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169

cication, and estimation method are reported in Table 3. An ongoing debate about superiority or even appropriateness of one index over another makes the issue of selecting which to use in assessing t very complex. For instance, Hu and Bentler (1998) advise against using GFI and AGFI because they are signicantly inuenced by sample size and are insufciently sensitive to model misspecication. Most t indices are inuenced by sample size and should not be interpreted independently of sample size (Hu and Bentler, 1998; Marsh et al., 1988). Therefore, no consistent criteria (i.e. cut-offs) can be dened to apply in all (or most) instances (Marsh et al., 1988). Until denitive t indices are developed, researchers should report multiple measures of t so reviewers and readers have the opportunity to evaluate the underlying t of the data to the model from multiple

perspectives. x2 should be reported with its corresponding degrees of freedom in order to be insightful. RMR and RMSEA, two measures that reect the residual differences between the input and implied (reproduced) matrices, indicate how well matrix covariance terms are predicted by the tested model. RMR in particular performs well under many conditions (Hu and Bentler, 1998; Marsh et al., 1988). Researchers might also report a summary of standardized (correlation) residuals because when most or all are quite small relative to correlations in the tested sample (Browne et al., 2002, p. 418), they indicate good model t (Bollen, 1989, p. 258).

4.3.1.2. Measurement model t. Measurement model t can be evaluated in two ways: rst, by assessing constructs reliability and convergent and discriminant

Table 3 Inuence of sample and estimation characteristics on model t indices Small sample (n) bias a Absolute x2 GFI AGFI RMR (or SRMR) RMSEA Incremental NFI NNFI (or TLI) Violations of normalityb Model misspecication c Estimation method effectd No preference ML preferred ML preferred ML preferred No preference General comments

Bias establishedf Poor for small ne can be used f Poor for small ne,f ML preferred for small ne Tends to over reject modele Poor for small ne Best index for small nf tends to over reject modele ML preferred for small ne ML preferred for small ne Bias establishedf

Problematic with ADFe Problematic with ADFe

Misspecs not identied by ADFe Misspecs not identied by ADFe Misspecs identied Misspecs identied

Use of index not recommended e Use of index not recommended e Recommended for all analysese Use with ADF not recommended e Use of index not recommended e

Some misspecs identied Misspecs identied

ML preferred ML preferred

CFI IFI (or BL89) Normed x2


a

Misspecs identied Misspecs identied

ML preferred ML preferred No preference

While all t indexes listed suffer small sample bias (approximately n < 250), we consolidate ndings by leading researchers. b Most normality violations have insignicant effects on t indexes, except those noted. c Identifying model misspecication is a positive characteristic; t indexes that do not identify misspecication are considered poor choices. d The following estimation methods investigated: maximum likelihood ratio (ML), generalized least square (GLS), asymptotically distribution free (ADF)e,f. e Hu and Bentler (1998). f Marsh et al. (1988).

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169

161

validity, and second, by examining the individual path (parameter) estimates (Bollen, 1989). Various indices of reliability can be computed to summarize how well LVs are measured by their MVs individually or jointly (individual item reliability, composite reliability, and average variance extracted; cf. Bagozzi and Yi, 1988; Fornell and Larcker, 1981). Our initial attempt to report reliability measures used by the authors proved difcult due to the diversity of methods used. Therefore, we limit our review to whether authors report at least one of the various measures. Overall, 86.0% of the applications describe some form of reliability assessment. We recommend that authors report at least one measure of construct reliability based on estimated model parameters (e.g. composite reliability or average variance extracted) (Bollen, 1989). Cronbach alpha is an inferior measure of reliability because in most cases it is only a lower bound on reliability (Bollen, 1989). In our review we found that Cronbach alpha was frequently presented as proof to establish unidimensionality. It is not sufcient for this purpose because a scale may not be unidimensional even if it has high reliability (Gerbing and Anderson, 1984). Our review also examined how published research dealt with the issue of discriminant validity. We found that 69.2% of all applications included evidence of discriminant validity. Our review indicates that despite a lack of standardization in the reported measures, most published research in OM includes some measure of reliability, unidimensionality and validity. Another way to assess measurement model t is to evaluate path estimates. In evaluating path estimates, sign (positive or negative), strength, and signicance should be aligned with theory. The magnitude of standard errors associated with path estimates should be small; a large standard error indicates an unstable parameter estimate that is subject to sampling error. Although recommended but rarely used in practice, the 90% condence interval (CI) around each path estimate is very useful (Browne and Cudeck, 1993). The CI provides an explicit indication of the degree of parameter estimate precision. Additionally, the statistical signicance of path estimates can be inferred from the 90% CI: if the 90% CI includes zero, then the path estimate is not signicantly different from zero (at a = 0.05). Overall, condence intervals are very

informative and we recommend their use in future studies. In our review, we found that 96.5% of the applications report path coefcients, 62.9% provide t statistics, 14.7% provide standard errors, and 2.1% report condence intervals. 4.3.1.3. Structural model t. In SEM models, the latent variable model represents the structural model t, and generally, the hypotheses of interest. In PA models that do not have LVs, the hypotheses of interest are generally represented by the paths between MVs. Like measurement model t, the sign, magnitude and statistical signicance of the structural path coefcients are examined in testing the hypotheses. Researchers should recognize the important distinction between variance t (explained variance in endogenous variables as measured by R2 for each structural equation) and covariance t (overall goodness of t, such as that tested by a x2-test). Authors emphasize covariance t a great deal more than variance t; in our review, 53.1% of the models presented evidence of the variance t compared to 96% that presented at least one index of overall t. It is important to distinguish between these two types of t because a model might t well but not explain a signicant amount of variation in endogenous variables or conversely, t poorly and explain a large amount of variation in endogenous variables (Fornell, 1983). In summary, we suggest that t indices should not be regarded as measures of usefulness of a model. They each contain some information about model t but none about model plausibility (Browne and Cudeck, 1993). Rather than establishing that t indices meet arbitrarily established cut-offs, future research should report a variety of absolute and incremental t indices for measurement, structural, and overall models and include a discussion of interpretation of t indices relative to the study design. We found many instances in which authors conclude that a particular model had better t than alternative models based on comparing t indices. While some t indices can be useful for such comparisons, most commonly employed t indices cannot be compared across models in this manner (e.g. a model with a lower RMSEA does not indicate better t than a model with a higher RMSEA). For nested alternate models, x2 difference test or Target Coefcient can be used (Marsh and Hocevar, 1985). For alternate models that are not nested, parsimony t

162

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169

measures such as Parsimonious NFI, Parsimonious GFI, Akaike information criterion (AIC) and normed x2 can be used (Hair et al., 1998). 4.3.2. Model respecication Although no model ts the real world exactly, a desirable outcome in SEM analysis is to show that a hypothesized model provides a good approximation of real world phenomena, as represented by an observed set of data. When an initial model of interest does not satisfy this objective, researchers often alter the model to improve its t to the data. Modication of a hypothesized model to improve its parsimony and/or t to the data is termed a specication search (Leamer, 1978; Long, 1983). A specication search is designed to identify and eliminate errors from the original specication of the hypothesized model. reskog and So rbom (1996) describe three Jo strategies in model specication (and evaluation): (1) strictly conrmatory, where a single a priori model is studied; (2) model generation, where an initial model is t to data and then modied (frequently with the use of modication indices) until it ts adequately; and (3) alternative models, where multiple a priori models are studied. Although not improper, the strictly conrmatory approach is highly restrictive and does not leave the researcher any latitude if the model does not work. The model generation approach is troublesome because of the potential for abuse, results that lack validity (MacCallum, 1986), and high susceptibility to capitalization on chance (MacCallum et al., 1992). Simulation work by MacCallum (1990) and Homburg and Dobartz (1992) indicates that only half of specication searches (even with correct restrictions and large samples) are successful in recovering the correct underlying model. In our review, 28.7% (41 of 143) of the applications reported making post hoc changes to respecify the model. We also examined the published articles for inconsistency between the model that was tested versus the model described in the text. In 31 out of 143 cases we found such inconsistency, where we could not match the described model with the tested model. We suspect that in many cases, authors made post hoc changes (perhaps to improve model t), but those changes were not well described. We found only 20.3% of the models were tested using alternate models. We recommend that researchers compare

alternate a priori models (either nested or unnested) to uncover the model that the observed data support best rather than use specication searches (Browne and Cudeck, 1989). Such practices may have a lower probability of identifying models with great t, but they increase the alignment of modeling results with our existing knowledge and theories. Leading journals must show a willingness to publish poor tting models for such advancement of knowledge and theory.

5. Presentation and interpretation of results We encountered many difculties related to presentation and interpretation of models, methods, analysis, and results in our review. In a majority of articles, we had difculty determining either the complete model (e.g. correlated measurement errors) or the complete set of MVs. Whether the model was t to a correlation or covariance matrix could not be ascertained for nearly half of the models, and reporting of t results was incomplete in a majority of models. In addition, issues of causation in cross-sectional designs, generalizability, and conrmation bias also raise concerns and are discussed in detail below. 5.1. Causality Each of the applications we reviewed used a crosssectional research design. The debate over whether concurrent measurement of variables can be used to infer causality is vibrant but unresolved (Gollob and Reichardt, 1991; Gollob and Reichardt, 1987; MacCallum and Austin, 2000). One point of agreement is that causal interpretation must be based on the theoretical grounding of and empirical support for a model (Pearl, 2000). In light of this ongoing debate, we suggest that OM researchers describe the theory they are testing and its expected manifested results as clearly as possible prior to conducting analysis. 5.2. Generalizability Generalizability of ndings refers to the applicability of ndings from one study with a nite, often small sample to a population (or other populations). Findings from single studies are subject to limitations due to sample or selection effects and their impact on

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169

163

the conclusions that can be drawn. In our review, such limitations were seldom acknowledged and results were usually interpreted and discussed as if they were expansively generalizable. Sample and selection effects are controlled (but not eliminated) by identifying a specic population and from it selecting a sample that is appropriate to the objectives of the study. Rather than identifying a specic population, the articles we reviewed focused predominantly on describing their samples. However, a structural equation model is a hypothesis about the structure of relationships among MVs and LVs in a specic population, and this population should be explicitly identied. Another aspect of generalizability involves replicating the results of a study in a different sample from the same population. We found that 15.4% of the reviewed applications used cross-validation and 18.9% used a split sample approach. Given the difculty in obtaining responses from multiple samples from a given population, the expected cross-validation index (ECVI), an index computed from a single sample, can indicate how well a solution obtained in one sample is likely to t an independent sample from the same population (Browne and Cudeck, 1989; Cudeck and Browne, 1983). Selecting the most appropriate set of measurement items to represent the domain of underlying LVs is critical when using SEM. However, there are few standardized instruments for LVs, making progress in empirical OM research slow and difcult. Appropriate operationalization of LVs is as critical as their repeated use: repetition helps to establish validity and reliability. (For a detailed discussion and guidelines on the selection effects related to good indicators, see Little et al., 1999; for OM measurement scales, see Roth andSchroeder,inpress.)A challenging issueariseswhen researchers are unable to validate previously used scales. In such situations, we suggest a two-pronged strategy. First, a priori the researcher must examine the assumptions employed in developing the previous scales and state their impact on replication. Second, upon failure to replicate with validity, the researcher must use an exploratory means to develop modied scales to be validated by future researchers. However, this respecied model should not be given the status of a hypothesized model and would need to be validated in the future with another sample from the same population.

5.3. Conrmation bias Conrmation bias is dened as a prejudice in favor of the evaluated model (Greenwald et al., 1986). Our review suggests that OM researchers (not unlike researchers in other elds) are highly susceptible to conrmation bias. Researchers evaluate a single model, give an overly positive evaluation of model t, and are reluctant to consider alternative explanations of data. An associated problem in this context is the existence of equivalent models, alternative models that are indistinct from the original model in terms of goodness of t to the data but with a distinct substantive meaning in terms of the underlying theory (MacCallum et al., 1993). In a study of 53 published applications in psychology, MacCallum et al. (1993) showed that equivalent models exist routinely in large numbers and are universally ignored by researchers. In order to mitigate problems related to conrmation bias, we recommend that OM researchers generate multiple alternate, equivalent models a priori and if one or more of these models cannot be eliminated due to theoretical reasons or poor t, to explicitly discuss the alternate explanation(s) underlying the data rather than conrming and presenting results from one denitive model (MacCallum et al., 1993).

6. Discussion and conclusion SEM has rapidly become an important and widely used research tool in the OM literature. Its attractiveness to OM researchers can be attributed to two factors. From CFA, SEM draws upon the notion of unobserved or latent variables, and from PA, SEM adopts the notion of modeling direct and indirect relationships. These advantages, combined with the availability of ever more user-friendly software, make it likely that SEM will enjoy widespread use in the future. We have provided both a review of the OM literature employing SEM as well as discussion and guidelines for improving its future use. Table 4 contains a summary of some of the most important issues discussed here, their implications, and recommendations for resolving these challenges. Below, we briey discuss these issues. As researchers, we should ensure that SEM is the correct method for examining the research question at

164

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169

Table 4 Implications and recommendations for select SEM issues Implications Formative (causal indicators) Bollen (1989), MacCallum and Browne (1993): Without additional constraints, the model is generally unidentied Poorly developed or weak relationships Hurley et al. (1997): More likely to result in a poor tting model requiring specication searches and post hoc model respecication MacCallum et al. (1992): Inated goodness of t statistics; Underestimated standard errors LISREL is inappropriate without additional constraints (Cudeck, 1989): Standard errors, condence intervals and test statistics for parameter estimates are incorrect in all cases Parameter estimates and t indices are incorrect in some cases MacCallum et al. (1996), Marsh et al. (1988), Hu and Bentler (1998): Associated with lower power, ceteris paribus Parameter estimates have lower reliability Fit indices are overestimated Recommendations Model as causal indicators (MacCallum and Browne, 1993) Report appropriate conditions and modeling issues Use alternative methods that demand less rigorous model specication such as EFA and Regression Analysis (Hurley et al., 1997) Use estimation methods that adjust for violation such as ML, Robust available in EQS; Use estimation methods that do not assume multivariate normality such as GLS, ADF Type of input matrix and software must be reported RAMONA in SYSTAT (Browne and Mels, 1998), EQS (Bentler, 1989), SEPATH (Steiger, 1999) can be used LISREL can be used with additional constraints (LISREL 8.50) AMOS cannot be used Conduct and report statistical power Simpler models (fewer parameters estimated, higher degrees of freedom) are associated with higher power (MacCallum et al., 1996) Use t indices that are less biased to small sample size such as NNFI; avoid t indices that are more biased, such as x2, GFI and NFI (Hu and Bentler, 1998) Report degrees of freedom Conduct and report statistical power Simpler models (fewer parameters estimated, higher degrees of freedom) are associated with higher power (MacCallum et al., 1996) Desirable condition (d.f. > 0) Assess and report model identication Explicitly discuss implication of unidentied models on generalizability of results Have at least three MVs per LV for CFA/SEM (Rigdon, 1995) Model as MV (not LV) Single MV can be modeled as LV only when MV is the perfect representation of the LV; specic conditions must be imposed for identication purposes (LISREL 8.50) Report correlated errors Justify their theoretical validity a priori Discuss the impact on measurement and structural parameter estimates and model t

Violating multivariate normality

Correlation matrix as input data

Small sample size

Few degrees of freedom (d.f.) MacCallum et al. (1996): Associated with lower power, ceteris paribus Parameter estimates have lower reliability Fit indices are overestimated Model identication d.f. = 0, results are not generalizable d.f. < 0, model cannot be estimated unless some parameters are xed or held constant To provide adequate representation of content domain, need sufcient MVs/LV May not provide adequate representation of content domain Poor reliability and validity because error variance cannot be estimated (Maruyama, 1998) Model is generally unidentied Gerbing and Anderson (1984): Alters measurement and structural parameter estimates Almost always improves model t Changes the substantive meaning of the model Non-recursive models Without additional constraints the model is unidentied

Number of MVs/LV One MV per LV

Correlated measurement errors

Explicitly report model is non-recursive and its cause Add constraints and report their impact (Long, 1983)

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169

165

hand. When theory development is at a nascent stage and patterns of relationships among LVs are relatively weak, SEM should be used with caution so that model conrmation and theory testing do not degenerate into extensive model respecication. Likewise, it is important that we use appropriate measurement methods and understand the distinction between formative and reective variables. Determining minimum sample size is, in part, dependent upon the number of parameter estimates in the hypothesized model. But emerging research in this area indicates that the relationship between sample size and number of parameter estimates is complex and dependent upon MV characteristics (MacCallum et al., 2001). Likewise, guidelines on degrees of freedom and model identication are not simple or straightforward. Researchers must be cognizant of these issues and we recommend that all studies discuss them explicitly. As the powerful capabilities of SEM derive partly from its highly restrictive simplifying assumptions, it is important that assumptions such as normality and skewness are carefully assessed prior to generating an input matrix and conducting analysis. With regard to model estimation, researchers should recognize that parameter estimates are not xed values, but rather depend upon the estimation method. For instance, parameter estimates obtained by using maximum likelihood ratio are different from those obtained using ordinary least squares (Browne and Arminger, 1995). Further, in evaluating model t, the correspondence between the hypothesized model and the observed data should be assessed using a variety of absolute and incremental t indices for measurement, structural, and overall models. In addition to path coefcients, condence intervals and standard errors should be assessed. Rather than hypothesizing a single model, multiple alternate models should be evaluated when possible, and research results should be cross validated using split or multiple samples. Given the very real possibility of alternate, equivalent models, researchers should be cautious in over-interpreting results. Because no model represents the real world exactly, we must be more forthright about the imperfection inherent in any model and acknowledge the literal implausibility of the model more explicitly (MacCallum, 2003). One of the most poignant observations in conducting this study was the inconsistency in the published

reporting of results and, in numerous instances, our inability to reconstruct the tested model based on the description in the text and the reported degrees of freedom. These issues can be resolved by attention to published guidelines for presenting results of SEM (e.g. Hoyle and Panter, 1995). To assist both during the review process and in building a cumulative tradition in the OM eld, sufcient information needs to be provided to understand (1) the population from which the data sample was obtained, (2) the distribution of the data, (3) the hypothesized measurement and structural models, and (4) statistical results to corroborate the subsequent interpretation and conclusions. We recommend that every published application of SEM provide a clear and complete specication of the model(s) and variables, preferably in the form of a graphical gure, including the measurement model linking LVs to MVs, the structural model connecting LVs, and specication of which parameters are being estimated and which are xed. It is helpful to identify specic research hypotheses on the graphical gure, both to clarify the model and to reduce the text needed to describe them. In addition to including a statement about the type of input data matrix, software and estimation method used, we recommend the input matrix be included in paper for future replications and meta-analytical research studies, but we recognize this is an editorial decision subject to space constraints. In terms of statistical results, we suggest researchers include multiple measures of t and criteria for evaluating t along with parameter estimates, and associated condence intervals and standard errors. Finally, interpretation of results should be guided by an understanding that models are imperfect and cannot be made to be exactly correct. We can enrich our knowledge by reviewing the use of SEM in more mature research elds such as psychology and marketing, including methodological advances. Some advances worthy of mention are validation studies using the multi-trait multi-method (MTMM) matrix method (cf. Cudeck, 1988; Widaman, 1985), measurement invariance (Widaman and Reise, 1997), and using categorical (Muthen, 1983) or experimental data (Russell et al., 1998). Our review of published SEM applications in the OM literature suggests that while reporting has improved over time, we need to pay attention to methodological issues in using SEM. Like any

166

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169

statistical technique or tool, it is important that SEM be used prudently if researchers want to take full advantage of its potential. SEM is a useful tool to represent multidimensional unobservable constructs and simultaneously examine structural relationships that are not well captured by traditional research methods (Gefen et al., 2000, p. 6). In the future, utilizing the guidelines presented here will improve the use of SEM in OM research, and thus, improve our collective understanding of OM theory and practice.

(matrix), d the error of measurement in exogenous manifest variables, y the measures of endogenous manifest variables, Ly the effect of endogenous LVs on their MVs (matrix), e the error of measurement in endogenous manifest variables, j the latent exogenous constructs, h the latent endogenous constructs, G the effect of exogenous constructs on endogenous constructs (matrix), B the effect of endogenous constructs on each of the other endogenous constructs (matrix) and z is the errors in equations or residuals. It is also necessary to dene the following covariance matrices: (a) f = E(jj0 ) is a covariance matrix for the exogenous LVs. (b) ud = E(dd0 ) is a covariance matrix for the measurement errors in the exogenous MVs. (c) ue = E(ee0 ) is a covariance matrix for the measurement errors in the endogenous MVs. (d) c = E(zz0 ) is a covariance matrix for the errors in equation for the endogenous LVs.

Acknowledgements We thank Michael Browne and Sriram Thirumalai for helpful comments on this paper. We also thank Carlos Rodriguez for assistance with article screening and data coding.

Appendix A. Mathematical specication of structural equation modeling A structural equation model can be dened as a hypothesis of a specic pattern of relations among a set of measured variables (MVs) and latent variables (LVs). The three equations presented below are fundamental to SEM. Eq. (1) represents the directional inuences of the exogenous LVs (j) on their indicators (x). Eq. (2) represents the directional inuences of the endogenous LVs (h) on their indicators ( y). Thus, Eqs. (1) and (2) link the observed (manifest) variables to unobserved (latent) variables through a factor analytic model and constitute the measurement portion of the model. Eq. (3) represents the endogenous LVs (h) as linear functions of other exogenous LVs (j) and endogenous LVs plus residual terms (z). Thus, Eq. (3) species relationships between LVs through a structural equation model and constitutes the structural portion of the model x Lx j d y Ly h e h Bh G j z (1) (2) (3)

Given this mathematical representation, it can be shown that the population covariance matrix for the MVs is a function of eight parameter matrices: Lx, Ly, G, B, f, ud, ue and c. Thus, given a hypothesized model in terms of xed and free parameters of the eightparameter matrices, and given a sample covariance matrix for the MVs, one can solve for estimates of the free parameters of the model. The most common approach for tting the model to data is to obtain maximum likelihood estimates of parameters, and an accompanying likelihood ratio x2-test of the null hypothesis that the model holds in the population. The notation above uses SEM as developed by reskog (1974) and represented in LISREL (Jo reskog Jo rbom, 1996). and So

References
Anderson, J.C., Gerbing, D.W., 1988. Structural equation modeling in practice: a review and recommended two step approach. Psychological Bulletin 103 (3), 411423. Anderson, J.C., Gerbing, D.W., 1984. The effects of sampling error on convergence, improper solutions, and goodness-of-t indices for maximum likelihood conrmatory factor analysis. Pyschometrika 49, 155173.

where x is the measures of exogenous manifest variables, Lx the effect of exogenous LVs on their MVs

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169 Bagozzi, R.P., Heatherton, T.F., 1994. A general approach to representing multifaceted personality constructs: application to state self-esteem. Structural Equation Modeling 1 (1), 3567. Bagozzi, R.P., Yi, Y., 1988. On the evaluation of structural equation models. Journal of the Academy of Marketing Science 16 (1), 7494. Barman, S., Hanna, M.D., LaForge, R.L., 2001. Perceived relevance and quality of POM journals: a decade later. Journal of Operations Management 19 (3), 367385. Baumgartner, H., Homburg, C., 1996. Applications of structural equation modeling in marketing and consumer research: a review. International Journal of Research in Marketing 13 (2), 139161. Bentler, P.M., 1989. EQS: Structural Equations Program Manual. BMDP Statistical Software, Los Angeles, CA. Bentler, P.M., Chou, C.P., 1987. Practical issues in structural modeling. Sociological Methods and Research 16 (1), 78117. Bollen, K.A., 1989. Structural Equations with Latent Variables. Wiley, New York. Bollen, K.A., Lennox, R., 1991. Conventional wisdom on measurement: a structural equation perspective. Psychological Bulletin 110, 305314. Brannick, M.T., 1995. Critical comments on applying covariance structure modeling. Journal of Organizational Behavior 16 (3), 201213. Brown, R.L., 1994. Efcacy of the indirect approach for estimating structural equation models with missing data: a comparison of ve methods. Structural Equation Modeling 1, 287316. Browne, M.W., Arminger, G., 1995. Specication and estimation of mean and covariance structure models. In: Arminger, G., Clogg, C.C., Sobel, M.E. (Eds.), Handbook of Statistical Modeling for the Social and Behavioral Sciences. Plenum, New York, pp. 185249. Browne, M.W., Cudeck, R., 1989. Single sample cross-validation indices for covariance structures. Multivariate Behavioral Research 24 (4), 445455. Browne, M.W., Cudeck, R., 1993. Alternative ways of assessing model t. In: Bollen, K.A., Long, J.S. (Eds.), Testing Structural Equation Models. Sage, Newbury Park, CA, pp. 136161. Browne, M.W., Mels, G., 1998. Path analysis: RAMONA. In: SYSTAT for Windows: Advanced Applications (Version 8), SYSTAT, Evanston, IL. Browne, M.W., MacCallum, R.C., Kim, C., Anderson, B.L., Glaser, R., 2002. When t indices and residuals are incompatible. Psychological Bulletin 7 (4), 403421. Chin, W.W., 1998. Issues and opinion on structural equation modeling. MIS Quarterly 22 (1), viixvi. Chin, W.W., Todd, P.A., 1995. On the use, usefulness, and ease of use of structural equation modeling in MIS research: a note of caution. MIS Quarterly 19 (2), 237246. Cohen, P., Cohen, J., Teresi, J., Marchi, M., Velez, C.N., 1990. Problems in the measurement of latent variables in structural equations causal models. Applied Psychological Measurement 14 (2), 183196. Cudeck, R., 1988. Multiplicative models and MTMM matrices. Multivariate Behavioral Research 13, 131147. Cudeck, R., 1989. Analysis of correlation matrices using covariance structure models. Psychological Bulletin 105, 317327.

167

Cudeck, R., Browne, M.W., 1983. Cross-validation of covariance structures. Multivariate Behavioral Research 18 (2), 147167. Enders, C.K., Bandalos, D.L., 2001. The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling 8 (3), 430457. Fornell, C., 1983. Issues in the application of covariance structure analysis. Journal of Consumer Research 9 (4), 443448. Fornell, C., Larcker, D.F., 1981. Evaluating structural equation models with unobservable variables and measurement errors. Journal of Marketing Research 18 (1), 3950. Fornell, C., Rhee, B., Yi, Y., 1991. Direct regression, reverse regression, and covariance structural analysis. Marketing Letters 2 (3), 309320. Garver, M.S., Mentzer, J.T., 1999. Logistics research methods: employing structural equation modeling to test for construct validity. Journal of Business Logistics 20 (1), 3357. Gefen, D., Straub, D.W., Boudreau, M., 2000. Structural equation modeling and regression: guidelines for research practice. Communications of the AIS 1 (7), 178. Gerbing, D.W., Anderson, J.C., 1984. On the meaning of withinfactor correlated measurement errors. Journal of Consumer Research 11, 572580. Goh, C., Holsapple, C.W., Johnson, L.E., Tanner, J.R., 1997. Evaluating and classifying POM journals. Journal of Operations Management 15 (2), 123138. Gollob, H.F., Reichardt, C.S., 1987. Taking account of time lags in causal models. Child Development 58 (1), 8092. Gollob, H.F., Reichardt, C.S., 1991. Interpreting and estimating indirect effects assuming time lags really matter. In: Collins, L.M., Horn, J.L. (Eds.), Best Methods for the Analysis of Change. American Psychological Association, Washington, DC, pp. 243259. Greenwald, A.G., Pratkanis, A.R., Leippe, M.R., Baumgartner, M.H., 1986. Under what conditions does theory obstruct research progress? Psychological Review 93 (2), 216229. Hair Jr., J.H., Anderson, R.E., Tatham, R.L., Black, W.C., 1998. Multivariate Data Analysis. Prentice-Hall, New Jersey. Hershberger, S.L., 2003. The growth of structural equation modeling: 19942001. Structural Equation Modeling 10 (1), 3546. Homburg, C., Dobartz, A., 1992. Covariance structure analysis via specication searches. Statistical Papers 33 (1), 119142. Hoyle, R.H., Panter, A.T., 1995. Writing about structural equation modeling. In: Hoyle, R.H. (Ed.), Structural Equation Modeling: Concepts, Issues, and Applications. Sage, Thousand Oaks, CA, pp. 158176. Hu, L., Bentler, P.M., 1998. Fit indices in covariance structure modeling: sensitivity to under-parameterized model misspecication. Psychological Methods 3 (4), 424453. Hurley, A.E., Scandura, T.A., Schriesheim, C.A., Brannick, M.T., Seers, A., Vandenberg, R.J., Williams, L.J., 1997. Exploratory and conrmatory factor analysis: guidelines, issues, and alternatives. Journal of Organizational Behavior 18 (6), 667 683. Jackson, D.L., 2003. Revisiting the sample size and number of parameter estimates: some support for the N:q hypothesis. Structural Equation Modeling 10 (1), 128141.

168

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169 Marsh, H.W., 1998. Pairwise deletion for missing data in structural equation models: nonpositive denite matrices, parameter estimates, goodness of t, and adjusted sample sizes. Structural Equation Modeling 5, 2236. Marsh, H.W., Balla, J.R., McDonald, R.P., 1988. Goodness-of-t indexes in conrmatory factor analysis: the effect of sample size. Psychological Bulletin 103 (3), 391410. Marsh, H.W., Hocevar, D., 1985. Applications of conrmatory factor analysis to the study of self concept: rst and higher order factor models and their invariance across groups. Psychological Bulletin 97, 562582. Maruyama, G., 1998. Basics of Structural Equation Modeling. Sage, Thousand Oaks, CA. Medsker, G.J., Williams, L.J., Holahan, P., 1994. A review of current practices for evaluating causal models in organizational behavior and human resources management research. Journal of Management 20 (2), 439464. Mulaik, S.S., James, L.R., Van Alstine, J., Bennett, N., Lind, S., Stillwell, C.D., 1989. An evaluation of goodness of t indices for structural equation models. Psychological Bulletin 105 (3), 430 445. Muthen, B., 1983. Latent variable structural equation modeling with categorical data. Journal of Econometrics 22 (1/2), 4366. Muthen, B., Kaplan, D., Hollis, M., 1987. On structural equation modeling with data that are not missing completely at random. Psychometrika 52, 431462. Pearl, J., 2000. Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge, UK. Rigdon, E.E., 1995. A necessary and sufcient identication rule for structural models estimated in practice. Multivariate Behavioral Research 30 (3), 359383. Roth, A., Schroeder, R., in press. Handbook of Multi-item Scales for Research in Operations Management. Sage. Russell, D.W., Kahn, J.H., Spoth, R., Altmaier, E.M., 1998. Analyzing data from experimental studies: a latent variable structural equation modeling approach. Journal of Counseling Psychology 45, 1829. Satorra, A., 2001. Goodness of t testing of structural equations models with multiple group data and nonnormality. In: Cudeck, rbon, D. (Eds.), Structural Equation ModelR.C., du Toit, S., So ing: Present and Future. Scientic Software International, Lincolnwood, IL, pp. 231256. Sedlmeier, P., Gigenrenzer, G., 1989. Do studies of statistical power have an effect on the power of the studies? Psychological Bulletin 105 (2), 309316. Shook, C.L., Ketchen, D.J., Hult, G.T.M., Kacmar, K.M., 2004. An assessment of the use of structural equation modeling in strategic management research. Strategic Management Journal 25 (4), 397404. Soteriou, A.C., Hadijinicola, G.C., Patsia, K., 1998. Assessing production and operations management related journals: the European perspective. Journal of Operations Management 17 (2), 225238. Steiger, J.H., 1999. Structural equation modeling (SEPATH). Statistica for Windows, vol. III. StatSoft, Tulsa, OK.

reskog, K.G., 1969. A general approach to conrmatory maxJo imum likelihood factor analysis. Psychometrika 34 (2 Part 1), 183202. reskog, K.G., 1974. Analyzing psychological data by structural Jo analysis of covariance matrices. In: Atkinson, R.C., Krantz, D.H., Luce, R.D., Suppes, P. (Eds.), Contemporary developments in mathematical psychology, vol. II. W.H. Freeman, San Francisco, pp. 156. reskog, K.G., 1993. Testing structural equation models. In: BolJo len, K.A., Long, J.S. (Eds.), Testing Structural Equation Models. Sage, Newbury Park, CA, pp. 294316. reskog, K.G., So rbom, D., 1996. LISREL 8: Users Reference Jo Guide. Scientic Software International Inc., Chicago, IL. Leamer, E.E., 1978. Specication Searches: Ad-hoc Inference with Non-experimental Data. Wiley, New York. Lei, M., Lomax, R.G., 2005. The effect of varying degrees of nonnormality in structural equation modeling. Structural Equation Modeling 12 (1), 127. Little, T.D., Lindenberger, U., Nesselroade, J.R., 1999. On selecting indicators for multivariate measurement and modeling with latent variables: when good indicators are bad and bad indicators are good. Psychological Methods 4 (2), 192211. Long, J.S., 1983. Covariance Structure Models: An Introduction to LISREL. Sage, Beverly Hill, CA. MacCallum, R.C., 2003. Working with imperfect models. Multivariate Behavioral Research 38 (1), 113139. MacCallum, R.C., 1990. The need for alternative measures of t in covariance structure modeling. Multivariate Behavioral Research 25 (2), 157162. MacCallum, R.C., 1986. Specication searches in covariance structure modeling. Psychological Bulletin 100 (1), 107 120. MacCallum, R.C., Austin, J.T., 2000. Applications of structural equation modeling in psychological research. Annual Review of Psychology 51 (1), 201226. MacCallum, R.C., Browne, M.W., 1993. The use of causal indicators in covariance structure models: some practical issues. Psychological Bulletin 114 (3), 533541. MacCallum, R.C., Browne, M.W., Sugawara, H.M., 1996. Power analysis and determination of sample size for covariance structure modeling. Psychological Methods 1 (1), 130149. MacCallum, R.C., Roznowski, M., Necowitz, L.B., 1992. Model modications in covariance structure analysis: the problem of capitalization on chance. Psychological Bulletin 111 (3), 490 504. MacCallum, R.C., Wegener, D.T., Uchino, B.N., Fabrigar, L.R., 1993. The problem of equivalent models in applications of covariance structure analysis. Psychological Bulletin 114 (1), 185199. MacCallum, R.C., Widaman, K.F., Preacher, K.J., Hong, S., 2001. Sample size in factor analysis: the role of model error. Multivariate Behavioral Research 36 (4), 611637. Malhotra, M.K., Grover, V., 1998. An assessment of survey research in POM: from constructs to theory. Journal of Operations Management 16 (4), 407425.

R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148169 Steiger, J., 2001. Driving fast in reverse. Journal of American Statistical Association 96, 331338. Tanaka, J.S., 1987. How big is big enough? Sample size and goodness of t in structural equation models with latent variables. Child Development 58, 134146. Tanaka, J.S., 1993. Multifaceted conceptions of t in structural equation models. In: Bollen, K.A., Long, J.S. (Eds.), Testing Structural Equation Models. Sage, Newbury Park, CA, pp. 1039. Teel, J.E., Bearden, W.O., Sharma, S., 1986. Interpreting LISREL estimates of explained variance in non-recursive structural equation models. Journal of Marketing Research 23 (2), 164168. Vokurka, R.J., 1996. The relative importance of journals used in operations management research: a citation analysis. Journal of Operations Management 14 (3), 345355.

169

West, S.G., Finch, J.F., Curran, P.J., 1995. Structural equation models with nonnormal variables: problems and remedies. In: Hoyle, R.H. (Ed.), Structural Equation Modeling: Issues, Concepts, and Applications. Sage, Newbury Park, CA, pp. 5675. Widaman, K.F., 1985. Hierarchically nested covariance structure models for multitrait-multimethod data. Applied Psychological Measurement 9, 126. Widaman, K.F., Reise, S., 1997. Exploring the measurement invariance of psychological instruments: applications in the substance use domain. In: Bryant, K.J., Windle, M., West, S.G. (Eds.), The Science of Prevention: Methodological Advances from Alcohol and Substance Abuse. American Psychological Association, Washington, DC, pp. 281324.

You might also like