Professional Documents
Culture Documents
Vandenberg / Improving
RESEARCH
Measurement
METHODS
Invariance Methods
ROBERT J. VANDENBERG
University of Georgia
Starting actually some years prior to its publication, but most certainly since the
appearance of the article by Vandenberg and Lance (2000), the topic of measurement
equivalence/invariance (ME/I) has been the catalyst of many stimulating conversa-
tions. Some of these conversations have been consultative in that input has been sought
or the author has sought it of others; some have been challenges in that an aspect of the
topic is questioned (e.g., the analytical procedures, its applicability, etc.); and others
have been of the comparative variety (e.g., differences between ME/I and differential
item functioning). Although the conversations had different origins, they shared a
common element. That is, they all started with questions denoting a need for continued
research on ME/Inot only on its underlying analytical procedures but also on its
Authors Note: The author was supported in part by a grant from the Centers for Disease Control and Pre-
vention (CDC) and the National Institute for Occupational Safety and Health (NIOSH) (Grant No.
1R010H03737-01A1; D. M. DeJoy, principal investigator). Its contents are the sole responsibility of the au-
thor and do not necessarily represent the official views of the CDC or NIOSH. The author wishes to thank the
membership of the Research Methods Division of the Academy of Management for appointing him as divi-
sion chair. The current article represents an opportunity born out of that appointment and, as such, would not
have been possible without their support. The author also wishes to thank the editor for providing the plat-
form to present these thoughts. Correspondence concerning this article should be addressed to Robert J.
Vandenberg, Terry College of Business, Department of Management, University of Georgia, Athens, GA
30602-6256; e-mail: rvandenb@terry.uga.edu.
Organizational Research Methods, Vol. 5 No. 2, April 2002 139-158
2002 Sage Publications
139
140 ORGANIZATIONAL RESEARCH METHODS
applicability. In other words, the questions all highlighted deficits in our understand-
ing of the conditions that make ME/I most appropriate, and what its limitations may be
with respect to its application. As stated by Little (2000, p. 218), many issues still
need to be considered and examined before these procedures can be used unequivo-
cally.
The primary goal of this article is to provide a synopsis of the key questions that
might form the basis of future research. There are two motives underlying this goal.
First, addressing each question through research is more than one or two persons could
handle. Hence, the motive is to simulate others to make addressing these questions a
component of their programs of research. Some of the following ideas are more care-
fully thought out than others, and some are already subjects of ongoing programmatic
research efforts. Nonetheless, the material is presented to help push thinking on these
issues. The second motive is less altruistic and more cautionary. Namely, as is so often
the case when something new enters the methodological arena (ME/I analytical pro-
cedures are not new, but they have been experiencing an awakening within organiza-
tional research), there is adoption fervor. A positive aspect of the fervor is that
researchers are attending closely to the topic and, therefore, accepting the premise that
it is warranted. A negative aspect of this fervor, however, is characterized by unques-
tioning faith on the part of some that the technique is correct or valid under all circum-
stances. Thus, the second motive is to temper this side of the fervor by noting that the
reason so many conversations occur is that there are a host of unknowns underlying
ME/I. Furthermore, until research addresses the unknowns, potential adopters are
asked to monitor the literature closely or run the risk of inappropriately applying it, or
perhaps inferring something about their data that is simply not accurate.
Making no claims that these are exhaustive, the following issues are introduced: (a)
the sensitivity and susceptibility of the ME/I analytical procedures; (b) partial metric
invariance; and (c) triggers, antecedents, and causes of not supporting ME/I when sup-
port is desired. The author also makes no claim of credit for these issues. Rather, these
are thoughts spanning literally years of conversations with David Chan, Gordon
Cheung, Charles Lance, Roger Rensvold, and Neal Schmittall individuals who have
themselves devoted large amounts of time to the ME/I topic. There have been conver-
sations with many others, such as Kathleen Bentein, Jeff Edwards, Mark Gavin, Larry
James, Hettie Richardson, Chris Riordan, and Larry Williams. Thus, it is the authors
preference to view this as an attempt to share with the reader a slate of ideas born out of
all of these individuals perspectives. Immediately following is a brief overview of the
types or levels of ME/I tests. Details for each type are provided in Vandenberg and
Lance (2000).
Overview of ME/I
Borrowing liberally from Vandenberg and Lance (2000), the need for conducting
ME/I tests was justified on the premise that there are circumstances that threaten the
quality of our measurement tools, but are not directly addressable through classical
test theory approaches such as the calculation of reliability coefficients. It was further
argued that rather than test for the presence of such circumstances, most researchers to
date have simply ignored their potential presence and, thereby, assumed them away.
Example questions of possible situations in which these circumstances are highly
probable and, therefore, require the use of ME/I test procedures are as follows: (a) Do
Vandenberg / Improving Measurement Invariance Methods 141
The general question of invariance of measurement is one of whether or not, under dif-
ferent conditions of observing and studying phenomena, measurements yield mea-
sures of the same attributes. If there is no evidence indicating presence or absence of
measurement invariancethe usual caseor there is evidence that such invariance
does not obtain, then the basis for drawing scientific inference is severely lacking:
findings of differences between individuals and groups cannot be unambiguously
interpreted. (p. 117, emphasis added)
1. An omnibus test of the equality of covariance matrices across groups, that is, a test of
the null hypothesis of invariant covariance matrices (i.e., = ), where g and g indi-
g g
5. A test of the null hypothesis that like itemsunique variances are invariant across groups
(i.e., = ). Tests (3) through (5) follow the same sequence as that recommended by
g g
Gulliksen and Wilks (1950) for tests of homogeneity of regression models across
groups and should be regarded as being similarly sequential, so that tests of scalar
invariance should be conducted only if (at least partial) metric invariance is established,
and tests of invariant uniquenesses should proceed only if (at least partial) metric and
scalar invariance has been established first.
A test of the null hypothesis that factor variances were invariant across groups (i.e., j =
g
6.
j ). This was sometimes treated as a complement to test (3), where differences in fac-
g
142 ORGANIZATIONAL RESEARCH METHODS
tor variances were interpreted as reflecting group differences in the calibration of true
scores (e.g., Schaubroeck & Green, 1989; Schmitt, 1982; Vandenberg & Self, 1993).
7. A test of the null hypothesis that factor covariances were invariant across groups (i.e.,
jj = jj ). This was sometimes treated as a complement to test (2), where differences
g g
A test of the null hypothesis of invariant factor means across groups (i.e., = ) which
g g
8.
often was invoked as a way to test for differences between groups in level on the con-
struct of interest.
Other observations made by the authors included the fact that all eight tests were
rarely conducted in the same study; rather, researchers chose tests based on their par-
ticular research needs. Furthermore, the most frequently conducted tests were those
for configural and metric invariance. By far the most important observation made by
Vandenberg and Lance (2000), however, was the fact that there were many cases in
their collection of studies where inaccurate inferences would have been made by the
various researchers if they had not undertaken the ME/I tests. That is, if the various re-
searchers had simply undertaken the tests of substantive interest (usually a comparison
of means) without first undertaking the ME/I tests, their conclusions would not have
been accurate. Even in cases where invariance was not an issue after being tested, the
authors could conclude much more confidently that observed differences were a func-
tion of the substantive phenomenon (i.e., were valid) and not due to some measure-
ment artifact. Regardless of the power of the ME/I procedures, however, there remain a
number of questions or issues that themselves highlight the need for continued re-
search on those procedures so that even greater confidence in their power is gained. As
noted above, attention to these questions came about through conversations, and it is to
these that the article now turns.
Sensitivity
Dealing with the sensitivity issue resides at the top of the to do list for one reason.
Namely, as it currently stands, it is uncertain how sensitive the analyses are to detecting
Vandenberg / Improving Measurement Invariance Methods 143
Invariant
Yes Covariance
No Further ME/I
Tests Warranted Matrices
g = g
No
Yes
Yes
Full Metric
Invariance No Partial No Groups Not
g = g Invariance Comparable
Yes
Relax
Yes
Constraints
Yes
Yes
Relax
Constraints
Test of Reliability
Or
Compare Latent Full Uniqueness Test of the Homogeneity of Uniqueness
Means Factor Covariance
Invariance
g=g Invariance
g= g
jg=jg
Yes No
Compare
Structural
Parameters
the true underlying phenomena. Although the sensitivity issue is relevant to all eight
levels of ME/I tests, it is addressed here relative to the two most widely acknowledged
phenomena: (a) detecting shifts in conceptual frames of reference across time or dif-
144 ORGANIZATIONAL RESEARCH METHODS
ferences in them between groups and (b) the ability to determine whether the measure-
ment tool is being calibrated differently to the true score either across time within the
same group or between different groups. The former is referred to as configural
invariance, and the latter as metric invariance in the above overview of the ME/I tests.
For configural invariance, researchers have accepted as evidence supporting invariance
or equivalence, a lack of difference in the pattern of fixed and free factor loadings
between groups or across time within a group (Byrne, Shavelson, & Muthn, 1989;
Horn & McArdle, 1992; Rensvold & Cheung, 2001; Schmitt, 1982; Vandenberg &
Lance, 2000; Vandenberg & Self, 1993). Most frequently cited evidence is whether
specifying the same factor pattern matrix between groups or within a group across time
results in strong fit indices (Vandenberg & Lance, 2000). Less often cited is whether
the factor covariances are the same (Schmitt, 1982; Schaubroeck & Green, 1989;
Vandenberg & Self, 1993). The premise underlying the analyses is that the factor pat-
tern matrix is a reasonable empirical representation of the cognitive frame of reference
against which the individual responded to the set of items, and hence tests of difference
in the factor pattern between groups or shifts across time within a group are also rea-
sonable tests of the equivalence in the conceptual or configural frame of reference.
But is it? More specifically, has anyone unequivocally demonstrated that these
analyses really do detect shifts or differences in conceptual frames of reference? A
review of the journals failed to uncover a direct test of this issue. A related question is,
Even if the analytical procedures do as claimed, what thresholds need to be reached
for a true difference or shift to exist?
Putting these questions aside for the moment, similar issues exist for the detection
of the second phenomenonmetric invariance or differences between groups or shifts
across time in how the observed score (typically a response to an item from a multi-
item operationalization of a construct) is calibrated to the true score. Tests for metric
invariance may only be undertaken if configural invariance is supported (Rensvold &
Cheung, 2001; Vandenberg & Lance, 2000). It makes little sense to test for differences
in calibration to the true score (i.e., the conceptual frame of reference) if the true score
is different between groups or has shifted across time within a group. For metric
invariance, researchers have accepted as evidence supporting invariance, a lack of dif-
ference in the magnitude of the factor loadings for like items between groups or across
time within a group (Byrne et al., 1989; Horn & McArdle, 1992; Rensvold & Cheung,
2001; Schmitt, 1982; Vandenberg & Lance, 2000; Vandenberg & Self, 1993). Most
frequently cited evidence is whether constraining loadings of like items to be equal
between groups or within a group across time results in a model that has equally strong
fit to the data as a model in which the factor loadings are freely estimated (i.e., the
configural invariance model) (Vandenberg & Lance, 2000). Less often cited is whether
the factor variances are the same (Schaubroeck & Green, 1989; Schmitt, 1982;
Vandenberg & Self, 1993). The premise underlying the analyses is that the factor load-
ings are the regression slopes that represent the relationship of item responses (or item
parcels) to their respective latent variables and, thus, represent the expected change in
the response (observed score) for every unit change in the latent variable (Bollen,
1989; Vandenberg & Lance, 2000). In this vein, the loading is accepted as a reasonable
empirical representation of the conceptual scaling used to make responses to an item,
and hence the tests of difference between groups or shifts across time within a group
are also reasonable tests of the equality with which the conceptual frame of reference
Vandenberg / Improving Measurement Invariance Methods 145
influences item set responding (Jreskog, 1969; Schmitt, 1982; Vandenberg & Lance,
2000; Vandenberg & Self, 1993).
However, once again, the question is, Is it? More specifically, has anyone
unequivocally demonstrated that the metric invariance analyses actually detect shifts
or differences in scaling units? As with configural invariance, an immediate answer to
this question was not found in a search of the literature. The related question is identi-
cal to that stated previously. Namely, Even if the analytical procedures do as claimed,
what thresholds need to be reached for a true difference or shift to exist?
The questions point to an issue that may be referred to as sensitivity. The impor-
tance of sensitivity becomes fairly obvious when its effects on the ability to meaning-
fully interpret ME/I outcomes and the decisions as to what to do next as a result of that
interpretation are examined. Two scenarios are presented to highlight the importance
of attending to the sensitivity issue. The first scenario is the one most familiar to read-
ers, that is, viewing a failure to support ME/I as an undesirable outcome. ME/I tests in
this scenario are employed as a pretest to evaluate assumptions of invariance before
undertaking the analyses evaluating an a priori hypothesis such as testing the differ-
ences between two groups on some conceptually meaningful variable. Specifically,
Vandenberg and Lance (2000) noted that tests for ME/I have been undertaken most fre-
quently in situations where researchers were concerned that invariance may not be
supported (when support is the desired goal) in the presence of individual or group dif-
ferences such as gender (e.g., Stacy, MacKinnon, & Pentz, 1993), age (e.g., Marsh,
1993), race (e.g., Chan, 1997), performance evaluation sources (e.g., Van Dyne &
LePine, 1998), and culture (e.g., Riordan & Vandenberg, 1994). Namely, whereas the
pattern of responses for one group may support the conceptual premises underlying
the focal measurement instruments (e.g., U.S. employee responses to an organiza-
tional commitment instrument), the pattern of the other groups responses may differ
dramatically from conceptual expectations (e.g., Chinese employee responses to the
same instrument). An immediate response (and indeed the current recommendation)
to not supporting configural invariance is to say that the groups may not be compared
at all; assuming, of course, that a comparison was desired such as testing a hypothesis
that one group would have a greater mean value on a measure than would the other
group (Rensvold & Cheung, 2001; Vandenberg & Lance, 2000).
Not supporting configural invariance within this scenario is highly undesirable
because it essentially means that the data collection effort was largely a waste, forcing
the researchers to collect new data or engage in some activity to reduce the differences.
One such activity may be a scale development effort (a topic addressed below) where
the premise is that although the construct exists for the nonsupportive group, the item
set from the current measurement instrument is not representing the content domain of
that construct as conceptually accepted by that group. Thus, a new measure containing
items representing the domain of that construct for that group would be developed (for
a detailed treatment of methods and analysis issues within cross-cultural contexts, see
van de Vijver & Leung, 1997). The point, however, is that creating new measures
within each culture is not an easy undertaking, particularly considering the complica-
tions for validating the two different instruments prior to application. Similarly,
assuming the study fails to support metric invariance, the option would be to engage in
some form of partial metric invariance strategy, which as recently noted by Rensvold
and Cheung (2001) is a complex undertaking (again, a topic expanded upon below).
146 ORGANIZATIONAL RESEARCH METHODS
In summary, the important point here is that there are serious implications to not
supporting either configural or metric invariance within this first scenario. As such,
researchers need to be as confident as possible in the ability of the analytical tools to
detect or be sensitive to the phenomena of interest. In this case, those phenomena are
the stability of the conceptual frames of reference and similarities in the calibration of
item responses to the true score. However, interest could lie in one or more of the other
ME/I tests as well. It is under the context of gaining confidence, however, that justifies
the need to undertake more research on the sensitivity of the ME/I procedures. That is,
given the seriousness of not supporting invariance when support is desired, confidence
in the validity of the procedures is a necessity. Otherwise, the researchers may make
some radical decisions that are unwarranted (i.e., throw out data that were difficult to
collect; develop new scales of the same construct but that represent the domain of the
construct within each group).
Confidence in the sensitivity of the ME/I procedures is equally relevant to the sec-
ond scenario, which was most recently raised by Riordan, Richardson, Schaffer, and
Vandenberg (2001) in a review of the change literature. Namely, it is conceivable that
differences between groups or within a group across time are desirable outcomes in the
sense that they are stated a priori. Specifically, whether due to direct, purposeful influ-
ences (e.g., an intervention into work structures) or to indirect processes (e.g., the
experience of work itself through socialization), there are conceptual reasons to expect
changes, and thus the ME/I tests are used as hypothesis-testing tools (an issue treated
in more detail below). Vandenberg and Self (1993) alluded to this by demonstrating
that on the first day of work, their sample did not have the needed conceptual frame of
reference to respond to a set of organizational commitment items, but the frame was
there after 3 months (i.e., after gaining work experience) and remained the same 6
months after job start. Similarly, Vandenberg and Scarpello (1990) predicted success-
fully that a model of work adjustment would describe the adjustment process of new-
comers but not of more tenured employees (note that this would be one type of struc-
tural invariance test using the parlance of Vandenberg & Lance, 2000). For the same
rationale developed within the context of the first scenario, maximum confidence in
the sensitivity of the ME/I procedures is imperative under this scenario. Otherwise, the
researcher may falsely infer differences due to an intervention and proclaim the inter-
vention a success when in reality the intervention had no effect, or attribute some
change in workplace attributes (e.g., attitudes, behavior) to some conceptual process
that is perhaps inaccurate in reality.
It is not the authors intent to give the impression that the ME/I procedures are abso-
lutely insensitive or invalid. That is, they detect solar ray fluctuations instead of, for
example, stability in conceptual frames of reference. There is compelling indirect evi-
dence that this is not the case. For example, within the context of the authors work,
Vandenberg and Self (1993) did not support configural invariance for two commitment
measures from time 1 (responses gathered during the first hour of the first day of work)
to time 2 (3 months after entry) and time 3 (6 months after entry). The largest contribu-
tion to the strength of the fit indices inferring that configural invariance was untenable
came from the time 1 data, whereas the smallest contributions came from the time 2
and time 3 data (i.e., well over 60% of the final chi-square value was due to time 1). The
authors undertook an exploratory factor analysis to see what factor pattern defined the
data. One of the commitment measures was defined by two factors and the other mea-
Vandenberg / Improving Measurement Invariance Methods 147
sure by three factors at time 1, but both were defined through one factor each (as they
should be) at times 2 and 3. Similarly, Riordan and Vandenberg (1994) failed to sup-
port metric invariance between Korean and U.S. employees in the first test, but did so
after removing the few items exhibiting the strongest differences and reconducting the
metric invariance test. If it can be assumed that variations in the level of the statistical
test underlying an ME/I procedure is a proxy for its sensitivity, then the fact that the
tests outcome covaried with the presence and absence of the purported cause for the
lack of invariance is supportive.
The above examples are only indirect, however, and do not quell a stronger concern
with respect to the sensitivity issue. That concern is one of How sensitive? or What
thresholds need to be reached in order to unequivocally infer that ME/I exists? The
author has recently been involved in some as-of-yet unpublished research where the
ME/I test failed to support configural invariance, but every post hoc analysis also failed
to uncover how the factor pattern differed. Could it be a case of being too sensitive?
Cheung (personal communication, 2002) noted similar issues in his research on ME/I
procedures.
Susceptibility
Again, the focal issue here is even if the analytical procedures are sensitive to the
phenomena as defined above, how susceptible are they to artifacts? Illustrative of one
response to this question has been provided by Rensvold (personal communication,
2002). In response to the potential need to redevelop new measures of old constructs
but using the content domain of the construct for a particular culture, Rensvold stated:
Interestingly, my PhD student Ms. Leung is working on precisely that problem. The
next-to-last edition of [Organizational Research and Methods] had an article report-
ing a new generalized self-efficacy [SEFF] scale. Ms. Leung has checked its ME/I
across groups of Chinese and American students. It falls apart. The research question
at this point is, Exactly what are the differences between American and Chinese con-
ceptualizations of SEFF? Shes planning to go the focus-group route.
Actually, though, I think the situation with respect to ME/I is even more dire than
generally supposed. Think of SEFF and self-esteem (SEST), two related although
conceptually distinct constructs. Suppose we have a survey form with both SEFF and
SEST items in random order. When given to a bunch of (say) U.S. subjects, the
responses decompose into two distinct factors, the way they should (in our opinion).
When given to another nationality, however, there may be strong cross-loadings. The
two factors may not be distinct at all. In such a situation, ME/I would fail at the
configural invariance level . . . but only because items tapping these two, very specific
constructs appear on the same sheet of paper. If we paired SEFF items with (say)
JOBSAT [job satisfaction] instead of SEST items, we may in fact find that we have
configural invariance. Further, we may find SEFF to be invariant at the LX [factor
loading] level. So it seems to me (although I have no data showing this) that configural
invariance is not a characteristic of one particular construct. It also depends upon the
conceptual neighborhood or the local nomological net. This would be a good
argument for testing constructs for ME/I one at a time, that is, one survey, one con-
struct. But doing so sweeps the dirty fact under the rug: just because a set of indicators
displays ME/I, theres no reason to assume that the construct has the same meaning
across groups, in the sense that it occupies the same place in the nomological net. Its
correlations with other, related constructs may be very different, as may its implica-
tions for behavior.
148 ORGANIZATIONAL RESEARCH METHODS
Research Agenda
experimental social psychologists have been studying attitude formation and change
for nearly a century and have been successful in creating and shifting a new attitude
(in the sense that it had not existed yet in the minds of their subjects) for their research
purposes. Effectively evaluating the sensitivity of the ME/I procedures to detect the
forms of change will require the adoption and adaptation of similar methodology,
which will require the use of laboratory techniques for some length of time. With
respect to configural invariance, for example, a study could be undertaken in which
time 1 data are collected from respondents in experimental and control groups on a set
of themes for which they have no attitude or frame of reference (however, pretesting
demonstrated that when the frame of reference or attitude exists, the items load on a
common factor). Experimental subjects could then be subjected to some manipulation
that provides the conceptual frames of reference for the themes. Obviously, subjects in
both groups are asked at some point (time 2) to again complete the instruments. Other
studies could be similarly designed to address other aspects of the sensitivity issue.
This type of method would easily lend itself to the inclusion of other manipulations if
conceptually warranted.
Supplemental simulation studies using Monte Carlo procedures could help us
address the issue of thresholds (e.g., how much difference between factor loadings is
required before metric invariance is untenable; how much misspecification in the fac-
tor pattern matrix is required before configural invariance is not supported). Further-
more, Monte Carlo procedures would be required to help us understand the influence
of study characteristics on the tenability of the ME/I tests such as differential sample
sizes between groups, or departures from multivariate normality in one group versus
another or across time within a group.
Summary
In conclusion, the primary point to this section was to stimulate research on ME/I
procedures oriented toward (a) increasing confidence in the validity of the procedures
and (b) shifting the application of those procedures to hypothesis-testing frameworks.
Specifically, the sensitivity and susceptibility issues are at the core of the procedures
validity. Research on those issues would stimulate greater confidence than is currently
the case that the procedures are truly detecting changes in the psychological pro-
cesses underlying respondents reactions toward a measure. It is in these very psycho-
logical processes that researchers have the most interest. However, as it currently
stands, it has not been unequivocally demonstrated that the ME/I procedures are able
to detect variations in those processes. Furthermore, research on these issues would
demonstrate what the potential boundary conditions to that ability are by noting how
susceptible the ME/I procedures are to other systematic influences characterizing the
study itself. Implicit in the paragraphs above was the suggestion that although the ME/
I procedures have been applied primarily as pretesting tools (where invariance needs to
be demonstrated before undertaking the substantive tests), they could, if valid, be pow-
erful tools in a hypothesis-testing contexta subject addressed more succinctly
below. That is, a researcher may have a priori grounds as to why differences in the psy-
chological process may exist and may wish to use the ME/I procedures to verify those
differences.
150 ORGANIZATIONAL RESEARCH METHODS
This topic is treated lightly here because Rensvold and Cheung (2001) provide an
excellent treatise on it. All readers using ME/I procedures are highly encouraged to
first read their chapter. Cheung and Rensvold brought this issue to the attention of
researchers 3 years ago (Cheung & Rensvold, 1999; Rensvold & Cheung, 1998).
Recall that even when not dealing with ME/I, the identification of the measurement
model requires fixing the loading of one item in each scale to the value of 1, which is
the referent indicator for that scale. Typical practice is simply to select an item. As
such, in a test of metric invariance, this item is by default fixed equal between groups or
across time within a group. Rensvold and Cheung (2001) noted, however, that the very
validity of the results from the metric invariance test can be jeopardized through the
typical practice. They argued instead that the selection of the referent indicator needs
to be systematically approached when tests for metric invariance are involved, and
only an item that is truly invariant should be selected as the referent. In the simplest
terms possible, the problem arises when the researcher inadvertently selects as referent
indicator an item that is not metrically invariant. The test then is inaccurate and conclu-
sions from it may not be warranted. Furthermore, if this misspecification is carried
into the ME/I tests following the test for metric invariance, their outcomes may be
equally inaccurate.
Rensvold and Cheungs (2001) chapter is highlighted less to alert readers to heed
the authors advice (which they obviously should) and more as an example of a pro-
grammatic and systematic approach to addressing an ME/I issue. The authors did not
stop at simply alerting us to the issue and the implications it has on making inferences
from the ME/I tests. Rather, they present a set of reasonable alternatives that may be
employed in order to address the concern within our studies. The alternatives, how-
ever, were validated or legitimized through research in which Rensvold and Cheung
set out purposely to demonstrate how the alternatives worked to resolve the issue.
Their approach is exactly the type required to address other ME/I issues raised in this
article.
This second issue is directed succinctly at the question But can it? The issue arose
during the development of the Vandenberg and Lance (2000) article. The context for
this issue is one in which conceptual reasons justify a test for mean differences
Vandenberg / Improving Measurement Invariance Methods 151
between groups on some measure, but before doing so, the researcher subjects the data
for that measure to tests for configural and metric invariance. Furthermore, the tests
support configural invariance but not metric invariance. Subsequently, after accurately
identifying the items that are not invariant, the researcher engages in a partial metric
invariance strategy whereby the noninvariant items are freely estimated in each group
but the invariant items are fixed equal between groups. This pattern of free and fixed
constraints is kept in place as the researcher tests for mean differences using the eighth
test listed in the overview above, which is a test of latent mean differences between
groups (e.g., see Vandenberg & Self, 1993).
When invariance is not an issue, a test of latent mean differences is supposedly
more accurate and, therefore, more advantageous than a traditional mean difference
test because error has been partialed out of the estimate of the latent mean prior to the
difference test (Cheung & Rensvold, 2000, p. 192). This assumption has been
extended to tests of latent means under conditions of partial metric invariance. That is,
the tests of mean differences will be more accurate with the imposed constraints repre-
senting the partial metric invariance than without them because the latent means are
adjusted for or accommodating the fact that only partial, not full, metric invariance
characterizes the data. But is it? Or, more precisely, if it does result in an adjustment as
implied, how is this accomplished? Presently, a definitive answer on this issue or even
a source that has treated this topic is not available.
Research Agenda
With respect to the referent indicator issue, Cheung and Rensvold are continuing
their work in this area. As it currently stands, some of their alternatives are complex to
apply and limited to situations in which one has only a few items to define each latent
variable (see Rensvold & Cheung, 2001). Thus, they are undertaking research to refine
the techniques and make them more user friendly. In the user friendly vein, Cheung
and Rensvold recognized that not all researchers share their background in methods
and, therefore, may experience difficulty navigating through some of the complexity
underlying their alternatives as they currently stand. As such, these authors set up tem-
porary aids and even offered on a limited time basis to consult with researchers.
One research avenue worthy of exploration in the referent indicator context is
whether similar conclusions as reached by Cheung and Rensvold with regard to which
item should be the referent indicator could be achieved using more conventional
exploratory factor analysis (EFA) techniques. Because configural invariance is a nec-
essary condition before applying tests for metric invariance (and, therefore, because
we have supported the a priori conceptual framework defining our measures), EFA
would not be employed in its traditional role as a post hoc means to find the factor pat-
tern within our data. Besides, this should be irrelevant if the fit indices in the test for
configural invariance were quite strong, since the same pattern should emerge from the
EFA (Hurley et al., 1997). The role for the EFA in this context would be simply to
examine (what may be described by some as an interocular eyeball test) the loadings to
identify one for each measure that is very close in value between groups not only in
terms of its loading on its focal factor but also in terms of its cross-loadings on the other
factors. The type of EFA would assume oblique factors (i.e., correlated) and, thus,
employ either an oblimin or promax rotation. Furthermore, it would use either princi-
ple axis or maximum likelihood procedures because the more traditional principal
152 ORGANIZATIONAL RESEARCH METHODS
Summary
The main point of the this section is similar to that of the previous section, in that the
concern is on undertaking research to improve the accuracy or validity of the ME/I pro-
cedures. Specifically, as noted by Rensvold and Cheung (2001) and by Vandenberg
and Lance (2000), although researchers frequently invoke partial metric invariance
strategies, the practice of doing so is often questionable. It is less a case, however, that
these researchers are knowingly engaging in questionable practices and more a case
that the guidance they are using to operationalize the strategy within their research is
not well formulated. Rensvold and Cheungs work on identifying a proper referent
indicator is clearly a stream of research directed at the heart of the issue, with improv-
ing guidance as the sole motive. As noted above, however, their work is at the begin-
ning stages and needs to be refined through even more research. Similarly, researchers
justify the application of partial metric invariance strategies in part on the belief that
doing so makes the ultimate test of mean differences more accurate than if some other
strategy had been employed. Again, more research is needed to ascertain whether this
is a valid assumption.
or one or more of the other forms of invariance, the major point is that there do exist
conceptual frameworks that lend themselves to systematically addressing the trigger
or causal issue.
Research Agenda
reference is not evoked for that culture through the instrument. An interesting variant
to this study would be to take, for example, the items defining Culture Ys frame of ref-
erence for a construct and asking members of Culture X to identify what those items
mean for them. Doing so would simply add face validity to the statistical evidence
indicating why configural invariance was not supported. The statements from Culture
Xs members would indicate that either the items did not have a common frame of ref-
erence or the items may belong to many different frames of reference from their
viewpoint.
Summary
At the risk of stating the obvious, addressing the issues in this section are important
to further establishing the validity of the ME/I tests and to extending their role beyond
the current one of being primarily a pretest tool to also being a hypothesis testing tool.
Most researchers have been satisfied with proxy variables as representations of the
possible trigger events. Littles (2000) critique of the Cheung and Rensvold (2000)
study focuses on this concern and the possibility that Cheung and Rensvold relied too
much on a proxy of the trigger event without actually operationalizing the trigger.
However, even the authors own work is symptomatic of this concern. For example,
Riordan and Vandenberg (1994) assumed that just because a person was a Korean
national, he or she would automatically prescribe to a collectivistic value system. The
same expectation was made of U.S. nationals, but in their case prescribing to an indi-
vidualistic value system. Failures to support invariance were attributed to those cul-
tural differences. The main point here is that the authors did not directly operationalize
the degree of prescription to collectivistic and individualistic value systems and,
hence, did not directly assess the association between the purported trigger (i.e.,
degree of prescription) and not supporting invariance. If the operationalizations had
been included, the validity of the ME/I procedures would have been strengthened
because the findings would have shown a direct link between the purported trigger (the
prescription to the value systems) and the invariance test.
Conclusions
The ideas presented above are by no means exhaustive of the researchable issues
characterizing the ME/I arena. For example, much of what is presented in the very last
section is crossing into item response theory (IRT) and, in particular, differential item
functioning (DIF). This realization, in turn, begs for comparative research examining
the relative merits of the ME/I procedures and other procedures purportedly examin-
ing item behavior. Chan (2000) presented one such example of research in this vein. He
extended the ME/I approach into the detection of uniform and nonuniform DIF. Simi-
larly, Scandura, Williams, and Hamilton (2001) used both the ME/I approach pre-
sented here and traditional IRT analyses to examine the behavior of an organizational
politics scale. Also, as a point of contrast to using only ME/I to examine cultural differ-
ences, Zhou, Schriesheim, and Beck (2001) presented the same issues but from an IRT
perspective. It is beyond the scope of this article to attempt an overview of this
research, as it would mean first presenting an introduction to IRT. The cited articles,
however, provide excellent starting points for those interested in this topic.
156 ORGANIZATIONAL RESEARCH METHODS
In closing, the primary intent of the current article is to stimulate thinking on these
issues and, by doing so, encourage continuing research addressing both the validity
and applicability of the ME/I procedures. A second goal is to educate others that there
are reasons to carefully monitor the literaturea concern recently expressed as well
by Little (2000). There are serious implications (e.g., throwing data out) for not sup-
porting ME/I when support is desired. It would be unfortunate to take such a serious
step when it is not warranted. Furthermore, it would be equally unfortunate to con-
clude that some organizational change process was successful when indeed it was not.
Among the issues presented in this article, the most important in the sense of address-
ing it first is the sensitivity issue. It resides at the core of the proceduresvalidity in that
it would address directly whether the procedures are detecting what researchers
believe they should be. If they are not valid, then the other research questions are moot.
However, that is one persons opinion. One final point of this article is that much of the
content presented here came through informal exchanges with many different col-
leagues from literally different corners of the globe. For me at least, these types of
exchanges are a tremendous source of motivation, and the author is grateful to those
individuals for their willingness to engage in these conversations.
References
Bollen, K. A. (1989). Structural equations with latent variables. New York: John Wiley.
Byrne, B. M., Shavelson, R. J., & Muthn, B. (1989). Testing for the equivalence of factor
covariance and mean structures: The issue of partial measurement invariance. Psychologi-
cal Bulletin, 105, 456-466.
Chan, D. (1997). Racial subgroup differences in predictive validity perceptions on personality
and cognitive ability tests. Journal of Applied Psychology, 82, 311-320.
Chan, D. (2000). Detection of differential item functioning on the Kirton Adaptation-Innovation
Inventory using multiple-group mean and covariance structure analyses. Multivariate Be-
havior Research, 35, 169-200.
Cheung, G. W., & Rensvold, R. B. (1999). Testing factorial invariance across groups: A
reconceptualization and proposed new method. Journal of Management, 25, 1-27.
Cheung, G. W., & Rensvold, R. B. (2000). Assessing extreme and acquiescence response sets in
cross-cultural research using structural equation modeling. Journal of Cross-Cultural Psy-
chology, 31, 187-212.
Gorsuch, R. L. (1990). Common factor analysis versus component analysis: Some well and little
known facts. Multivariate Behavioral Research, 25, 33-39.
Gulliksen, H., & Wilks, S. S. (1950). Regression tests for several samples. Psychometrika, 15,
91-114.
Horn, J. L., & McArdle, J. J. (1992). A practical and theoretical guide to measurement
invariance in aging research. Experimental Aging Research, 18, 117-144.
Hurley, A. E., Scandura, T. A., Schriesheim, C. A., Brannick, M. T., Seers, A., Vandenberg, R. J.,
& Williams, L. J. (1997). Exploratory and confirmatory factor analysis: Guidelines, issues,
and alternatives. Journal of Organizational Behavior, 18, 667-683.
Jreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analy-
sis. Psychometrika, 34, 183-202.
Lawler, E. E., III. (1992). The ultimate advantage: Creating the high involvement organization.
San Francisco: Jossey-Bass.
Lawler, E. E., III. (1996). From the ground up: Six principles for building the new logic corpora-
tion. San Francisco: Jossey-Bass.
Vandenberg / Improving Measurement Invariance Methods 157
Zhou, X., Schriesheim, C. A., & Beck, W. (2001). The importance of measurement equivalence
in transnational research: A test of individual-level predictions about culture and the differ-
ential use of organizational influence tactics, with and without measurement equivalence.
In C. A. Schriesheim & L. L. Neider (Eds.), Research in management: Vol. 1. Equivalence
in measurement (pp. 51-98). Greenwich, CT: Information Age.
Robert J. Vandenberg received a Ph.D. in social psychology from the University of Georgia. He is a full pro-
fessor in the Department of Management of the Terry College of Business at the University of Georgia. He is
currently past division chair of the Research Methods Division of the Academy of Management. He has
served on the editorial boards of the Journal of Applied Psychology, Journal of Management, Organiza-
tional Behavior and Human Decision Processes, and Organizational Research Methods. He is currently an
associate editor for Organizational Research Methods. His research interests include research methods,
high-involvement work processes, and employee work adjustment processes.