Toward A Further Understanding of and Improvement in Measurement Invariance PDF

ORGANIZATIONAL
Vandenberg / Improving
RESEARCH
Measurement
METHODS
Invariance Methods
Toward a Further Understanding of and

Improvement in Measurement Invariance
Methods and Procedures
ROBERT J. VANDENBERG
University of Georgia
Recognizing that measurement invariance was rarely tested for in organizational

research, and that it should be, Vandenberg and Lance elaborated on the impor-
tance of conducting tests of measurement invariance and proposed an integrative
paradigm for conducting sequences of measurement invariance tests. Building on
their platform, the current article addresses some of the shortcomings in our un-
derstanding of the analytical procedures. In particular, it points out the need to ad-
dress (a) the sensitivity of the analytical procedures, (b) the susceptibility of the
procedures to contextual influences, (c) how partial invariance affects the tests of
substantive interest, and (d) the triggers or causes for not supporting invariance.
In the hopes of stimulating further research on these topics, ideas are presented as
to how this research may be undertaken.
Starting actually some years prior to its publication, but most certainly since the
appearance of the article by Vandenberg and Lance (2000), the topic of measurement
equivalence/invariance (ME/I) has been the catalyst of many stimulating conversa-
tions. Some of these conversations have been consultative in that input has been sought
or the author has sought it of others; some have been challenges in that an aspect of the
topic is questioned (e.g., the analytical procedures, its applicability, etc.); and others
have been of the comparative variety (e.g., differences between ME/I and differential
item functioning). Although the conversations had different origins, they shared a
common element. That is, they all started with questions denoting a need for continued
research on ME/Inot only on its underlying analytical procedures but also on its
Authors Note: The author was supported in part by a grant from the Centers for Disease Control and Pre-
vention (CDC) and the National Institute for Occupational Safety and Health (NIOSH) (Grant No.
1R010H03737-01A1; D. M. DeJoy, principal investigator). Its contents are the sole responsibility of the au-
thor and do not necessarily represent the official views of the CDC or NIOSH. The author wishes to thank the
membership of the Research Methods Division of the Academy of Management for appointing him as divi-
sion chair. The current article represents an opportunity born out of that appointment and, as such, would not
have been possible without their support. The author also wishes to thank the editor for providing the plat-
form to present these thoughts. Correspondence concerning this article should be addressed to Robert J.
Vandenberg, Terry College of Business, Department of Management, University of Georgia, Athens, GA
30602-6256; e-mail: rvandenb@terry.uga.edu.
Organizational Research Methods, Vol. 5 No. 2, April 2002 139-158
2002 Sage Publications
139
140 ORGANIZATIONAL RESEARCH METHODS
applicability. In other words, the questions all highlighted deficits in our understand-
ing of the conditions that make ME/I most appropriate, and what its limitations may be
with respect to its application. As stated by Little (2000, p. 218), many issues still
need to be considered and examined before these procedures can be used unequivo-
cally.
The primary goal of this article is to provide a synopsis of the key questions that
might form the basis of future research. There are two motives underlying this goal.
First, addressing each question through research is more than one or two persons could
handle. Hence, the motive is to simulate others to make addressing these questions a
component of their programs of research. Some of the following ideas are more care-
fully thought out than others, and some are already subjects of ongoing programmatic
research efforts. Nonetheless, the material is presented to help push thinking on these
issues. The second motive is less altruistic and more cautionary. Namely, as is so often
the case when something new enters the methodological arena (ME/I analytical pro-
cedures are not new, but they have been experiencing an awakening within organiza-
tional research), there is adoption fervor. A positive aspect of the fervor is that
researchers are attending closely to the topic and, therefore, accepting the premise that
it is warranted. A negative aspect of this fervor, however, is characterized by unques-
tioning faith on the part of some that the technique is correct or valid under all circum-
stances. Thus, the second motive is to temper this side of the fervor by noting that the
reason so many conversations occur is that there are a host of unknowns underlying
ME/I. Furthermore, until research addresses the unknowns, potential adopters are
asked to monitor the literature closely or run the risk of inappropriately applying it, or
perhaps inferring something about their data that is simply not accurate.
Making no claims that these are exhaustive, the following issues are introduced: (a)
the sensitivity and susceptibility of the ME/I analytical procedures; (b) partial metric
invariance; and (c) triggers, antecedents, and causes of not supporting ME/I when sup-
port is desired. The author also makes no claim of credit for these issues. Rather, these
are thoughts spanning literally years of conversations with David Chan, Gordon
Cheung, Charles Lance, Roger Rensvold, and Neal Schmittall individuals who have
themselves devoted large amounts of time to the ME/I topic. There have been conver-
sations with many others, such as Kathleen Bentein, Jeff Edwards, Mark Gavin, Larry
James, Hettie Richardson, Chris Riordan, and Larry Williams. Thus, it is the authors
preference to view this as an attempt to share with the reader a slate of ideas born out of
all of these individuals perspectives. Immediately following is a brief overview of the
types or levels of ME/I tests. Details for each type are provided in Vandenberg and
Lance (2000).
Overview of ME/I
Borrowing liberally from Vandenberg and Lance (2000), the need for conducting
ME/I tests was justified on the premise that there are circumstances that threaten the
quality of our measurement tools, but are not directly addressable through classical
test theory approaches such as the calculation of reliability coefficients. It was further
argued that rather than test for the presence of such circumstances, most researchers to
date have simply ignored their potential presence and, thereby, assumed them away.
Example questions of possible situations in which these circumstances are highly
probable and, therefore, require the use of ME/I test procedures are as follows: (a) Do
Vandenberg / Improving Measurement Invariance Methods 141
individuals from different cultures interpret and respond to a given measurement

instrument using the same conceptual frame of reference? (b) Do rating sources (say in
a 360-performance rating situation) use the same definition of performance when rat-
ing the same person on the same performance dimensions? (c) Are there individual dif-
ferences (e.g., demographic, personality) that trigger the use of different frames of ref-
erence when responding to measures? or (d) Does a process that is purposely altered
such as an organization intervention also change the conceptual frame of reference
against which responses are made?
The questions are meant to be illustrative and not exhaustive of the situations war-
ranting the use of ME/I tests. The problem of not conducting ME/I analyses was stated
succinctly by Horn and McArdle (1992) a decade ago:
The general question of invariance of measurement is one of whether or not, under dif-
ferent conditions of observing and studying phenomena, measurements yield mea-
sures of the same attributes. If there is no evidence indicating presence or absence of
measurement invariancethe usual caseor there is evidence that such invariance
does not obtain, then the basis for drawing scientific inference is severely lacking:
findings of differences between individuals and groups cannot be unambiguously
interpreted. (p. 117, emphasis added)
Based on an extensive review of the studies employing ME/I test procedures,

Vandenberg and Lance (2000) observed that there are eight primary tests of ME/I
(there are also variants), and there is a recommended sequence among them justified
on conceptual and statistical grounds. Their flow chart outlining the sequence is repro-
duced in Figure 1. Those interested in the justification for this sequencing should go to
the original source. Directly quoting from Vandenberg and Lance (2000, pp. 12-13),
the eight primary ME/I tests are as follows:
1. An omnibus test of the equality of covariance matrices across groups, that is, a test of
the null hypothesis of invariant covariance matrices (i.e., = ), where g and g indi-
g g
cate different groups.

2. A test of configural invariance, that is, a test of a weak factorial invariance null hy-
pothesis (Horn & McArdle, 1992) in which the same pattern of fixed and free factor
loadings is specified for each group. Configural invariance must be established in order
for subsequent tests to be meaningful.
3. A test of metric invariance (Horn & McArdle, 1992), or a test of a strong factorial
invariance null hypothesis that factor loadings for like items are invariant across groups
(i.e., = ). (At least partial) metric invariance must be established in order for subse-
g g
quent tests to be meaningful.

4. A test of scalar invariance (Meredith, 1993; Steenkamp & Baumgartner, 1998), or a
test of the null hypothesis that intercepts of like items regressions on the latent vari-
able(s) are invariant across groups (i.e., = ).
g g
5. A test of the null hypothesis that like itemsunique variances are invariant across groups
(i.e., = ). Tests (3) through (5) follow the same sequence as that recommended by
g g
Gulliksen and Wilks (1950) for tests of homogeneity of regression models across
groups and should be regarded as being similarly sequential, so that tests of scalar
invariance should be conducted only if (at least partial) metric invariance is established,
and tests of invariant uniquenesses should proceed only if (at least partial) metric and
scalar invariance has been established first.
A test of the null hypothesis that factor variances were invariant across groups (i.e., j =
g
6.
j ). This was sometimes treated as a complement to test (3), where differences in fac-
g
tor variances were interpreted as reflecting group differences in the calibration of true
scores (e.g., Schaubroeck & Green, 1989; Schmitt, 1982; Vandenberg & Self, 1993).
7. A test of the null hypothesis that factor covariances were invariant across groups (i.e.,
jj = jj ). This was sometimes treated as a complement to test (2), where differences
g g
in factor covariances were interpreted as reflecting differences in conceptual associa-

tions among the true scores (e.g., Schmitt, 1982). Often, this and test (6) were combined
in an omnibus test of the equality of the latent variables variance/covariance matrices
across groups (i.e., = ).
g g
A test of the null hypothesis of invariant factor means across groups (i.e., = ) which
g g
8.
often was invoked as a way to test for differences between groups in level on the con-
struct of interest.
Other observations made by the authors included the fact that all eight tests were
rarely conducted in the same study; rather, researchers chose tests based on their par-
ticular research needs. Furthermore, the most frequently conducted tests were those
for configural and metric invariance. By far the most important observation made by
Vandenberg and Lance (2000), however, was the fact that there were many cases in
their collection of studies where inaccurate inferences would have been made by the
various researchers if they had not undertaken the ME/I tests. That is, if the various re-
searchers had simply undertaken the tests of substantive interest (usually a comparison
of means) without first undertaking the ME/I tests, their conclusions would not have
been accurate. Even in cases where invariance was not an issue after being tested, the
authors could conclude much more confidently that observed differences were a func-
tion of the substantive phenomenon (i.e., were valid) and not due to some measure-
ment artifact. Regardless of the power of the ME/I procedures, however, there remain a
number of questions or issues that themselves highlight the need for continued re-
search on those procedures so that even greater confidence in their power is gained. As
noted above, attention to these questions came about through conversations, and it is to
these that the article now turns.
Sensitivity and Susceptibility

A number of conversations over the years revolved around the topic of How do you
know? Illustrative issues would be, How do you know that the test for configural
invariance (or any other ME/I test) is truly detecting changes in or differences between
groups in terms of the conceptual frame of reference used to make responses? or
How do you know that failure to support ME/I (i.e., differences exist) is due to differ-
ences or shifts in the measurement properties of the instruments, or is an artifact of
some other influence? The first question is an issue of sensitivity in the vein that the
ability of the analytical tools to validly detect true changes is questioned. The second
question is one of susceptibility in that failure to support ME/I may be due not neces-
sarily to true differences in the measurement properties but, rather, to the susceptibility
of the analytical tools to artifacts such as common method variance. Although overlap-
ping to some degree, each one is treated separately in the following pages.
Sensitivity
Dealing with the sensitivity issue resides at the top of the to do list for one reason.
Namely, as it currently stands, it is uncertain how sensitive the analyses are to detecting
Invariant
Yes Covariance
No Further ME/I
Tests Warranted Matrices
g = g
No
Full Configural No Partial No Groups Not

Invariance Invariance Comparable
Yes
Yes
Full Metric
Invariance No Partial No Groups Not
g = g Invariance Comparable
Yes
Relax
Yes
Constraints
Full Scalar Invariance No Partial No Groups Not

g =g Invariance Comparable
Yes
Yes
Relax
Constraints
Option A Option B Option C
Test of Reliability
Or
Compare Latent Full Uniqueness Test of the Homogeneity of Uniqueness
Means Factor Covariance
Invariance
g=g Invariance
g= g
jg=jg
Yes No
Homogeneous Partial Invariance

Unique Factor Covariance
Variances Invariance
jjg=jjg
Compare
Structural
Parameters
Figure 1: Flowchart of the Recommended Sequence for Conducting Tests of Measure-

ment Equivalence/Invariance (ME/I)
the true underlying phenomena. Although the sensitivity issue is relevant to all eight
levels of ME/I tests, it is addressed here relative to the two most widely acknowledged
phenomena: (a) detecting shifts in conceptual frames of reference across time or dif-
ferences in them between groups and (b) the ability to determine whether the measure-
ment tool is being calibrated differently to the true score either across time within the
same group or between different groups. The former is referred to as configural
invariance, and the latter as metric invariance in the above overview of the ME/I tests.
For configural invariance, researchers have accepted as evidence supporting invariance
or equivalence, a lack of difference in the pattern of fixed and free factor loadings
between groups or across time within a group (Byrne, Shavelson, & Muthn, 1989;
Horn & McArdle, 1992; Rensvold & Cheung, 2001; Schmitt, 1982; Vandenberg &
Lance, 2000; Vandenberg & Self, 1993). Most frequently cited evidence is whether
specifying the same factor pattern matrix between groups or within a group across time
results in strong fit indices (Vandenberg & Lance, 2000). Less often cited is whether
the factor covariances are the same (Schmitt, 1982; Schaubroeck & Green, 1989;
Vandenberg & Self, 1993). The premise underlying the analyses is that the factor pat-
tern matrix is a reasonable empirical representation of the cognitive frame of reference
against which the individual responded to the set of items, and hence tests of difference
in the factor pattern between groups or shifts across time within a group are also rea-
sonable tests of the equivalence in the conceptual or configural frame of reference.
But is it? More specifically, has anyone unequivocally demonstrated that these
analyses really do detect shifts or differences in conceptual frames of reference? A
review of the journals failed to uncover a direct test of this issue. A related question is,
Even if the analytical procedures do as claimed, what thresholds need to be reached
for a true difference or shift to exist?
Putting these questions aside for the moment, similar issues exist for the detection
of the second phenomenonmetric invariance or differences between groups or shifts
across time in how the observed score (typically a response to an item from a multi-
item operationalization of a construct) is calibrated to the true score. Tests for metric
invariance may only be undertaken if configural invariance is supported (Rensvold &
Cheung, 2001; Vandenberg & Lance, 2000). It makes little sense to test for differences
in calibration to the true score (i.e., the conceptual frame of reference) if the true score
is different between groups or has shifted across time within a group. For metric
invariance, researchers have accepted as evidence supporting invariance, a lack of dif-
ference in the magnitude of the factor loadings for like items between groups or across
time within a group (Byrne et al., 1989; Horn & McArdle, 1992; Rensvold & Cheung,
2001; Schmitt, 1982; Vandenberg & Lance, 2000; Vandenberg & Self, 1993). Most
frequently cited evidence is whether constraining loadings of like items to be equal
between groups or within a group across time results in a model that has equally strong
fit to the data as a model in which the factor loadings are freely estimated (i.e., the
configural invariance model) (Vandenberg & Lance, 2000). Less often cited is whether
the factor variances are the same (Schaubroeck & Green, 1989; Schmitt, 1982;
Vandenberg & Self, 1993). The premise underlying the analyses is that the factor load-
ings are the regression slopes that represent the relationship of item responses (or item
parcels) to their respective latent variables and, thus, represent the expected change in
the response (observed score) for every unit change in the latent variable (Bollen,
1989; Vandenberg & Lance, 2000). In this vein, the loading is accepted as a reasonable
empirical representation of the conceptual scaling used to make responses to an item,
and hence the tests of difference between groups or shifts across time within a group
are also reasonable tests of the equality with which the conceptual frame of reference
influences item set responding (Jreskog, 1969; Schmitt, 1982; Vandenberg & Lance,
2000; Vandenberg & Self, 1993).
However, once again, the question is, Is it? More specifically, has anyone
unequivocally demonstrated that the metric invariance analyses actually detect shifts
or differences in scaling units? As with configural invariance, an immediate answer to
this question was not found in a search of the literature. The related question is identi-
cal to that stated previously. Namely, Even if the analytical procedures do as claimed,
what thresholds need to be reached for a true difference or shift to exist?
The questions point to an issue that may be referred to as sensitivity. The impor-
tance of sensitivity becomes fairly obvious when its effects on the ability to meaning-
fully interpret ME/I outcomes and the decisions as to what to do next as a result of that
interpretation are examined. Two scenarios are presented to highlight the importance
of attending to the sensitivity issue. The first scenario is the one most familiar to read-
ers, that is, viewing a failure to support ME/I as an undesirable outcome. ME/I tests in
this scenario are employed as a pretest to evaluate assumptions of invariance before
undertaking the analyses evaluating an a priori hypothesis such as testing the differ-
ences between two groups on some conceptually meaningful variable. Specifically,
Vandenberg and Lance (2000) noted that tests for ME/I have been undertaken most fre-
quently in situations where researchers were concerned that invariance may not be
supported (when support is the desired goal) in the presence of individual or group dif-
ferences such as gender (e.g., Stacy, MacKinnon, & Pentz, 1993), age (e.g., Marsh,
1993), race (e.g., Chan, 1997), performance evaluation sources (e.g., Van Dyne &
LePine, 1998), and culture (e.g., Riordan & Vandenberg, 1994). Namely, whereas the
pattern of responses for one group may support the conceptual premises underlying
the focal measurement instruments (e.g., U.S. employee responses to an organiza-
tional commitment instrument), the pattern of the other groups responses may differ
dramatically from conceptual expectations (e.g., Chinese employee responses to the
same instrument). An immediate response (and indeed the current recommendation)
to not supporting configural invariance is to say that the groups may not be compared
at all; assuming, of course, that a comparison was desired such as testing a hypothesis
that one group would have a greater mean value on a measure than would the other
group (Rensvold & Cheung, 2001; Vandenberg & Lance, 2000).
Not supporting configural invariance within this scenario is highly undesirable
because it essentially means that the data collection effort was largely a waste, forcing
the researchers to collect new data or engage in some activity to reduce the differences.
One such activity may be a scale development effort (a topic addressed below) where
the premise is that although the construct exists for the nonsupportive group, the item
set from the current measurement instrument is not representing the content domain of
that construct as conceptually accepted by that group. Thus, a new measure containing
items representing the domain of that construct for that group would be developed (for
a detailed treatment of methods and analysis issues within cross-cultural contexts, see
van de Vijver & Leung, 1997). The point, however, is that creating new measures
within each culture is not an easy undertaking, particularly considering the complica-
tions for validating the two different instruments prior to application. Similarly,
assuming the study fails to support metric invariance, the option would be to engage in
some form of partial metric invariance strategy, which as recently noted by Rensvold
and Cheung (2001) is a complex undertaking (again, a topic expanded upon below).
In summary, the important point here is that there are serious implications to not
supporting either configural or metric invariance within this first scenario. As such,
researchers need to be as confident as possible in the ability of the analytical tools to
detect or be sensitive to the phenomena of interest. In this case, those phenomena are
the stability of the conceptual frames of reference and similarities in the calibration of
item responses to the true score. However, interest could lie in one or more of the other
ME/I tests as well. It is under the context of gaining confidence, however, that justifies
the need to undertake more research on the sensitivity of the ME/I procedures. That is,
given the seriousness of not supporting invariance when support is desired, confidence
in the validity of the procedures is a necessity. Otherwise, the researchers may make
some radical decisions that are unwarranted (i.e., throw out data that were difficult to
collect; develop new scales of the same construct but that represent the domain of the
construct within each group).
Confidence in the sensitivity of the ME/I procedures is equally relevant to the sec-
ond scenario, which was most recently raised by Riordan, Richardson, Schaffer, and
Vandenberg (2001) in a review of the change literature. Namely, it is conceivable that
differences between groups or within a group across time are desirable outcomes in the
sense that they are stated a priori. Specifically, whether due to direct, purposeful influ-
ences (e.g., an intervention into work structures) or to indirect processes (e.g., the
experience of work itself through socialization), there are conceptual reasons to expect
changes, and thus the ME/I tests are used as hypothesis-testing tools (an issue treated
in more detail below). Vandenberg and Self (1993) alluded to this by demonstrating
that on the first day of work, their sample did not have the needed conceptual frame of
reference to respond to a set of organizational commitment items, but the frame was
there after 3 months (i.e., after gaining work experience) and remained the same 6
months after job start. Similarly, Vandenberg and Scarpello (1990) predicted success-
fully that a model of work adjustment would describe the adjustment process of new-
comers but not of more tenured employees (note that this would be one type of struc-
tural invariance test using the parlance of Vandenberg & Lance, 2000). For the same
rationale developed within the context of the first scenario, maximum confidence in
the sensitivity of the ME/I procedures is imperative under this scenario. Otherwise, the
researcher may falsely infer differences due to an intervention and proclaim the inter-
vention a success when in reality the intervention had no effect, or attribute some
change in workplace attributes (e.g., attitudes, behavior) to some conceptual process
that is perhaps inaccurate in reality.
It is not the authors intent to give the impression that the ME/I procedures are abso-
lutely insensitive or invalid. That is, they detect solar ray fluctuations instead of, for
example, stability in conceptual frames of reference. There is compelling indirect evi-
dence that this is not the case. For example, within the context of the authors work,
Vandenberg and Self (1993) did not support configural invariance for two commitment
measures from time 1 (responses gathered during the first hour of the first day of work)
to time 2 (3 months after entry) and time 3 (6 months after entry). The largest contribu-
tion to the strength of the fit indices inferring that configural invariance was untenable
came from the time 1 data, whereas the smallest contributions came from the time 2
and time 3 data (i.e., well over 60% of the final chi-square value was due to time 1). The
authors undertook an exploratory factor analysis to see what factor pattern defined the
data. One of the commitment measures was defined by two factors and the other mea-
sure by three factors at time 1, but both were defined through one factor each (as they
should be) at times 2 and 3. Similarly, Riordan and Vandenberg (1994) failed to sup-
port metric invariance between Korean and U.S. employees in the first test, but did so
after removing the few items exhibiting the strongest differences and reconducting the
metric invariance test. If it can be assumed that variations in the level of the statistical
test underlying an ME/I procedure is a proxy for its sensitivity, then the fact that the
tests outcome covaried with the presence and absence of the purported cause for the
lack of invariance is supportive.
The above examples are only indirect, however, and do not quell a stronger concern
with respect to the sensitivity issue. That concern is one of How sensitive? or What
thresholds need to be reached in order to unequivocally infer that ME/I exists? The
author has recently been involved in some as-of-yet unpublished research where the
ME/I test failed to support configural invariance, but every post hoc analysis also failed
to uncover how the factor pattern differed. Could it be a case of being too sensitive?
Cheung (personal communication, 2002) noted similar issues in his research on ME/I
procedures.
Susceptibility
Again, the focal issue here is even if the analytical procedures are sensitive to the
phenomena as defined above, how susceptible are they to artifacts? Illustrative of one
response to this question has been provided by Rensvold (personal communication,
2002). In response to the potential need to redevelop new measures of old constructs
but using the content domain of the construct for a particular culture, Rensvold stated:
Interestingly, my PhD student Ms. Leung is working on precisely that problem. The
next-to-last edition of [Organizational Research and Methods] had an article report-
ing a new generalized self-efficacy [SEFF] scale. Ms. Leung has checked its ME/I
across groups of Chinese and American students. It falls apart. The research question
at this point is, Exactly what are the differences between American and Chinese con-
ceptualizations of SEFF? Shes planning to go the focus-group route.
Actually, though, I think the situation with respect to ME/I is even more dire than
generally supposed. Think of SEFF and self-esteem (SEST), two related although
conceptually distinct constructs. Suppose we have a survey form with both SEFF and
SEST items in random order. When given to a bunch of (say) U.S. subjects, the
responses decompose into two distinct factors, the way they should (in our opinion).
When given to another nationality, however, there may be strong cross-loadings. The
two factors may not be distinct at all. In such a situation, ME/I would fail at the
configural invariance level . . . but only because items tapping these two, very specific
constructs appear on the same sheet of paper. If we paired SEFF items with (say)
JOBSAT [job satisfaction] instead of SEST items, we may in fact find that we have
configural invariance. Further, we may find SEFF to be invariant at the LX [factor
loading] level. So it seems to me (although I have no data showing this) that configural
invariance is not a characteristic of one particular construct. It also depends upon the
conceptual neighborhood or the local nomological net. This would be a good
argument for testing constructs for ME/I one at a time, that is, one survey, one con-
struct. But doing so sweeps the dirty fact under the rug: just because a set of indicators
displays ME/I, theres no reason to assume that the construct has the same meaning
across groups, in the sense that it occupies the same place in the nomological net. Its
correlations with other, related constructs may be very different, as may its implica-
tions for behavior.
Another illustration of the susceptibility issue was provided by Williams (2001),

who noted a concern related to the impact common method variance may have on our
ability to meaningfully test for ME/I. Assume, for example, that a researcher is testing
for ME/I between two groups on multiple, multi-item measures. However, because
there is an unobserved reality in which statistics operate, the researcher does not know
that the responses of one group were influenced very heavily by common method arti-
fact. Assuming that configural invariance is desired, it is conceivable that the re-
searcher would fail to support it because the pattern of free and fixed loadings for the
common method group is actually a misspecification of the real pattern that un-
derlies the data. Even if the configural invariance hurdle were jumped, metric
invariance may be untenable because the factor loadings in the common method
group would be influenced not only by the focal construct but also by the unobserved
and unaccounted for common method construct. In turn, the error terms for items
would be inflated because the proportion of the systematic influence of the common
method artifact not absorbed through its association with the latent variable underly-
ing an item would be shoved into the error term. The latter in turn would make any test
for invariant uniqueness (another of the eight ME/I tests) questionable.
Rensvolds and Williamss points may be obvious, but they are worth emphasizing.
Namely, the local nomological net and the strong presence of common method arti-
fact represent characteristics of the study that may influence the viability of the ME/I
testing procedures. Besides these two characteristics, ME/I procedures may also be
susceptible to other study characteristics, such as sample size or large departures
from multivariate normality. Emphasis is placed on the words may be, however,
because as it currently stands, the author is unaware of research that has unequivocally
addressed these issues in the context of ME/I tests. Hence, this is simply an unknown.
Research Agenda
So as not to create an impression of idleness, research has either started or is in the

beginning stages with respect to some of the issues. For example, Williams (2001) has
completed an initial study to examine the susceptibility of ME/I tests to common
method influences. Although some results are available, they are preliminary at this
point. Suffice it to say that the evidence indicates that common method artifact does
affect the interpretability of the ME/I findings (Williams, 2001). However, further
work needs to be conducted to ascertain how much influence is needed before ME/I
findings become meaningless.
Most troublesome, however, is the research required to address the sensitivity issue
because an ideal evaluation requires the creation of an attitude within respondents that
has not existed before, and also the use of some manipulation that will cause respon-
dents to recalibrate the true score. The creation of an attitude is required to demonstrate
how sensitive the tests for configural invariance are, but the problem is that there are
very few, if any, objects (e.g., people such as politicians, things such as automobile
type, places such as Afghanistan, attributes such as color, etc.) about which individuals
have not formed an opinion or an attitude in the natural environment. Similarly, how
does a researcher shift the conceptual scaling of a constructsomething that does not
exist except in the psyche of the respondentsand, more practically, how does the
researcher know it shifted? Understanding that this avenue needs to be fully explored,
experimental social psychologists have been studying attitude formation and change
for nearly a century and have been successful in creating and shifting a new attitude
(in the sense that it had not existed yet in the minds of their subjects) for their research
purposes. Effectively evaluating the sensitivity of the ME/I procedures to detect the
forms of change will require the adoption and adaptation of similar methodology,
which will require the use of laboratory techniques for some length of time. With
respect to configural invariance, for example, a study could be undertaken in which
time 1 data are collected from respondents in experimental and control groups on a set
of themes for which they have no attitude or frame of reference (however, pretesting
demonstrated that when the frame of reference or attitude exists, the items load on a
common factor). Experimental subjects could then be subjected to some manipulation
that provides the conceptual frames of reference for the themes. Obviously, subjects in
both groups are asked at some point (time 2) to again complete the instruments. Other
studies could be similarly designed to address other aspects of the sensitivity issue.
This type of method would easily lend itself to the inclusion of other manipulations if
conceptually warranted.
Supplemental simulation studies using Monte Carlo procedures could help us
address the issue of thresholds (e.g., how much difference between factor loadings is
required before metric invariance is untenable; how much misspecification in the fac-
tor pattern matrix is required before configural invariance is not supported). Further-
more, Monte Carlo procedures would be required to help us understand the influence
of study characteristics on the tenability of the ME/I tests such as differential sample
sizes between groups, or departures from multivariate normality in one group versus
another or across time within a group.
Summary
In conclusion, the primary point to this section was to stimulate research on ME/I
procedures oriented toward (a) increasing confidence in the validity of the procedures
and (b) shifting the application of those procedures to hypothesis-testing frameworks.
Specifically, the sensitivity and susceptibility issues are at the core of the procedures
validity. Research on those issues would stimulate greater confidence than is currently
the case that the procedures are truly detecting changes in the psychological pro-
cesses underlying respondents reactions toward a measure. It is in these very psycho-
logical processes that researchers have the most interest. However, as it currently
stands, it has not been unequivocally demonstrated that the ME/I procedures are able
to detect variations in those processes. Furthermore, research on these issues would
demonstrate what the potential boundary conditions to that ability are by noting how
susceptible the ME/I procedures are to other systematic influences characterizing the
study itself. Implicit in the paragraphs above was the suggestion that although the ME/
I procedures have been applied primarily as pretesting tools (where invariance needs to
be demonstrated before undertaking the substantive tests), they could, if valid, be pow-
erful tools in a hypothesis-testing contexta subject addressed more succinctly
below. That is, a researcher may have a priori grounds as to why differences in the psy-
chological process may exist and may wish to use the ME/I procedures to verify those
differences.
Partial Metric Invariance

Assume for the following that the first scenario is preferred (i.e., not supporting
configural and metric invariance is an undesirable outcome). Although not supporting
configural invariance has a fatal flaw characteristic to it, a general sentiment with
regard to the inability to support metric invariance is that the data may still be used
because it may be accommodated in further analyses (Rensvold & Cheung, 2001;
Vandenberg & Lance, 2000). But can it? The basis for questioning whether partial met-
ric invariance may be accommodated stems from two issues that need to be addressed
before absolute confidence is gained: (a) referent indicator identification and (b)
mechanisms defining its control.
Referent Indicator Identification
This topic is treated lightly here because Rensvold and Cheung (2001) provide an
excellent treatise on it. All readers using ME/I procedures are highly encouraged to
first read their chapter. Cheung and Rensvold brought this issue to the attention of
researchers 3 years ago (Cheung & Rensvold, 1999; Rensvold & Cheung, 1998).
Recall that even when not dealing with ME/I, the identification of the measurement
model requires fixing the loading of one item in each scale to the value of 1, which is
the referent indicator for that scale. Typical practice is simply to select an item. As
such, in a test of metric invariance, this item is by default fixed equal between groups or
across time within a group. Rensvold and Cheung (2001) noted, however, that the very
validity of the results from the metric invariance test can be jeopardized through the
typical practice. They argued instead that the selection of the referent indicator needs
to be systematically approached when tests for metric invariance are involved, and
only an item that is truly invariant should be selected as the referent. In the simplest
terms possible, the problem arises when the researcher inadvertently selects as referent
indicator an item that is not metrically invariant. The test then is inaccurate and conclu-
sions from it may not be warranted. Furthermore, if this misspecification is carried
into the ME/I tests following the test for metric invariance, their outcomes may be
equally inaccurate.
Rensvold and Cheungs (2001) chapter is highlighted less to alert readers to heed
the authors advice (which they obviously should) and more as an example of a pro-
grammatic and systematic approach to addressing an ME/I issue. The authors did not
stop at simply alerting us to the issue and the implications it has on making inferences
from the ME/I tests. Rather, they present a set of reasonable alternatives that may be
employed in order to address the concern within our studies. The alternatives, how-
ever, were validated or legitimized through research in which Rensvold and Cheung
set out purposely to demonstrate how the alternatives worked to resolve the issue.
Their approach is exactly the type required to address other ME/I issues raised in this
article.
Mechanisms Through Which Control Is Achieved
This second issue is directed succinctly at the question But can it? The issue arose
during the development of the Vandenberg and Lance (2000) article. The context for
this issue is one in which conceptual reasons justify a test for mean differences
between groups on some measure, but before doing so, the researcher subjects the data
for that measure to tests for configural and metric invariance. Furthermore, the tests
support configural invariance but not metric invariance. Subsequently, after accurately
identifying the items that are not invariant, the researcher engages in a partial metric
invariance strategy whereby the noninvariant items are freely estimated in each group
but the invariant items are fixed equal between groups. This pattern of free and fixed
constraints is kept in place as the researcher tests for mean differences using the eighth
test listed in the overview above, which is a test of latent mean differences between
groups (e.g., see Vandenberg & Self, 1993).
When invariance is not an issue, a test of latent mean differences is supposedly
more accurate and, therefore, more advantageous than a traditional mean difference
test because error has been partialed out of the estimate of the latent mean prior to the
difference test (Cheung & Rensvold, 2000, p. 192). This assumption has been
extended to tests of latent means under conditions of partial metric invariance. That is,
the tests of mean differences will be more accurate with the imposed constraints repre-
senting the partial metric invariance than without them because the latent means are
adjusted for or accommodating the fact that only partial, not full, metric invariance
characterizes the data. But is it? Or, more precisely, if it does result in an adjustment as
implied, how is this accomplished? Presently, a definitive answer on this issue or even
a source that has treated this topic is not available.
Research Agenda
With respect to the referent indicator issue, Cheung and Rensvold are continuing
their work in this area. As it currently stands, some of their alternatives are complex to
apply and limited to situations in which one has only a few items to define each latent
variable (see Rensvold & Cheung, 2001). Thus, they are undertaking research to refine
the techniques and make them more user friendly. In the user friendly vein, Cheung
and Rensvold recognized that not all researchers share their background in methods
and, therefore, may experience difficulty navigating through some of the complexity
underlying their alternatives as they currently stand. As such, these authors set up tem-
porary aids and even offered on a limited time basis to consult with researchers.
One research avenue worthy of exploration in the referent indicator context is
whether similar conclusions as reached by Cheung and Rensvold with regard to which
item should be the referent indicator could be achieved using more conventional
exploratory factor analysis (EFA) techniques. Because configural invariance is a nec-
essary condition before applying tests for metric invariance (and, therefore, because
we have supported the a priori conceptual framework defining our measures), EFA
would not be employed in its traditional role as a post hoc means to find the factor pat-
tern within our data. Besides, this should be irrelevant if the fit indices in the test for
configural invariance were quite strong, since the same pattern should emerge from the
EFA (Hurley et al., 1997). The role for the EFA in this context would be simply to
examine (what may be described by some as an interocular eyeball test) the loadings to
identify one for each measure that is very close in value between groups not only in
terms of its loading on its focal factor but also in terms of its cross-loadings on the other
factors. The type of EFA would assume oblique factors (i.e., correlated) and, thus,
employ either an oblimin or promax rotation. Furthermore, it would use either princi-
ple axis or maximum likelihood procedures because the more traditional principal
components analysis assumes no measurement error (Gorsuch, 1990; Snook &

Gorsuch, 1989). In that way and with the exception of permitting cross-loadings, EFA
makes many of the same assumptions as the procedure underlying the factor analysis
employed within the test for configural invariance. The suggestion above of simply
examining the loadings is admittedly unsophisticated, but if this procedure is corrob-
orated through a comparison with the alternatives presented by Rensvold and Cheung
(2001), then researchers have another alternative to addressing the partial metric
invariance problem. The EFA alternative, however, is somewhat simpler than the oth-
ers, and given researchersgeneral familiarity with EFA procedures, it may be easier to
apply at this juncture. The main point, however, is that these alternatives should be
explored and compared, and the results of the comparison should be used to facilitate
researchers application of ME/I procedures.
The issue concerning whether simply imposing the pattern representing partial
metric invariance is enough to control it in the other tests needs further investigation.
On the one hand, the structure of research on the issue may be something as simple as
working it out algebraically in equation form to show how the matrix of fixed and free
factor loadings works through the matrix of latent variables to determine the latent
means. On the other hand, the research may be more complex, possibly requiring a
simulation approach. Regardless of what form the research stream takes, it will be nec-
essary to show how the means are adjusted for in the presence of partial metric
invariance (or, for that matter, partial invariance of any of the other tests subsequent to
the metric invariance test) and how this adjustment results in a more accurate test of
mean differences than could be achieved without the adjustment.
Summary
The main point of the this section is similar to that of the previous section, in that the
concern is on undertaking research to improve the accuracy or validity of the ME/I pro-
cedures. Specifically, as noted by Rensvold and Cheung (2001) and by Vandenberg
and Lance (2000), although researchers frequently invoke partial metric invariance
strategies, the practice of doing so is often questionable. It is less a case, however, that
these researchers are knowingly engaging in questionable practices and more a case
that the guidance they are using to operationalize the strategy within their research is
not well formulated. Rensvold and Cheungs work on identifying a proper referent
indicator is clearly a stream of research directed at the heart of the issue, with improv-
ing guidance as the sole motive. As noted above, however, their work is at the begin-
ning stages and needs to be refined through even more research. Similarly, researchers
justify the application of partial metric invariance strategies in part on the belief that
doing so makes the ultimate test of mean differences more accurate than if some other
strategy had been employed. Again, more research is needed to ascertain whether this
is a valid assumption.
Triggers, Antecedents, and Causes

Somewhat in the same vein as the second scenario (i.e., using the ME/I procedures
to test for expected changes or differences), there has been little attempt to predict a
priori what factors result in a failure to support invariance (Chan, 2000; Schmitt, per-
sonal communication, 2000). In general, the focus in this section is on those events that
trigger or cause a shift in respondents conceptual frames of reference, a recalibration

of responses to the true score, and changes in the conditions characterizing the other
forms of ME/I.
Assume, for example, that the same set of items is used to measure a particular con-
struct in two cultures. Could a researcher predict which items will fail to be metrically
invariant because the terms in those items are more emotionally charging in one cul-
ture than in another? The word predict in the preceding sentence is important
because it presumes that the differences were expected on some conceptual basis. Sim-
ilarly, could a researcher predict that because of cultural differences, an intervention
would have no effect on the conceptual frame of reference of some construct in one
group, but for another culture a profound effect may be observed? For example, the
implementation of teams may have little influence on employeesconceptualization of
team commitment or satisfaction in a strongly collectivistic culture. However, in a
strongly individualistic culture, the intervention may actually be the stimulus for form-
ing those cognitive frames of reference. ME/I procedures would be used to examine
the reasonableness of those scenarios. Thus, finding no support for configural invariance
would be a desirable outcome in this application because it provides the empirical evi-
dence supporting the occurrence of change. Viewing the failure to support configural
invariance as desirable is a very different interpretation relative to the majority of pre-
vious applications of the configural invariance test, where this outcome was a highly
undesirable one (Vandenberg & Lance, 2000).
Although culture was used above to illustrate the trigger issue, the issue certainly
extends into all arenas where individual or group differences exist. There are many
interventions, for example, that use as a benchmark of success the adoption of different
mental models of work on the part of employees. The presumption is that once those
mental models are adopted, the employees will behave differently and, as such, per-
form at more positive levels. One such intervention is that falling within the realm of
high-involvement work processes (see Vandenberg, Richardson, & Eastman, 1999).
The basic premises are that changes are instituted in the organization to arouse and
promote the attributes of power, information, rewards, and knowledge within all
employees at all levels. The adoption of these attributes as their own purportedly stim-
ulates a great deal of self-leadership, autonomy, and initiative on the part of the
employees so that they are working harder, smarter, and more creatively than would be
the case if those attributes were the purview of just top management (Lawler, 1992,
1996). What would happen, however, if through the selection and attrition process, one
part of the organization was characterized primarily by employees who possessed very
low self-efficacy and another part of the organization was characterized primarily by
very high self-efficacy? Obviously, one testable hypothesis would be that the low self-
efficacy employees may not accept and, therefore, adopt the new mental model
because of the lack of confidence in their abilities to work on their own and take initia-
tive. The lack of acceptance may be observed in a difference between the two groups in
terms of any of the ME/I tests, but most interesting would be the tests for configural
and metric invariance. Also, the process underlying this model presumes an infusion
of the mental model across all organizational ranks. As such, could a researcher pre-
dict, for example, that the success of the intervention would be indicated by a failure to
support configural invariance at time 1 (preintervention) between top management
and the rest of the organization but that strong support would be observed at time 2
(after intervention implementation)? Regardless of whether it is configural invariance
or one or more of the other forms of invariance, the major point is that there do exist
conceptual frameworks that lend themselves to systematically addressing the trigger
or causal issue.
Research Agenda
The description presented in the paragraph above obviously dictates in many

respects a programmatic means to also addressing the raised issues. However, other
research approaches need to be adopted. As one example of a possible approach,
Cheung and Rensvold (2000) tested hypotheses with regard to potential antecedents
(i.e., extreme and acquiescence response sets) for not supporting invariance. The
reader is once again directed toward that study as a demonstration of a programmatic
approach to identifying potential trigger conditions making invariance untenable. The
reader is also directed to Littles (2000) critique of Cheung and Rensvolds study in
order to highlight the difficulties in successfully being able to isolate the purported
triggers. The point is that the articles together provide an excellent example of an
approach required to address this issue.
Another example of an approach was suggested by Cheung (personal communica-
tion, 2002), Rensvold (personal communication, 2002), and Schmitt (personal com-
munication, 2000). Assume that after translating a strongly psychometrically sound
instrument established in Culture X into the language of Culture Y, the tests for metric
invariance are not supported and, indeed, the factor loadings are quite small for Culture
Y. One possible interpretation of this outcome is that the items are not as indicative of
the content domain of the construct for Culture Y as they are for Culture X (Chan,
2000). As a result, a new measure of the construct for Culture Y is developed, which
now contains an item set representing critical incidents more indicative of the con-
structs content domain from that cultures perspective. In a future application, both
item sets are presented to samples from Culture X and Culture Y. The expectation is
that configural invariance will be supported (responses to all items are driven from the
same conceptual frame of reference) but that the test of metric invariance in which all
items are fixed equal across groups will be untenable. Following the process outlined
in the last few sentences could act as a basic methodological approach to examining
other triggers or causes for metric invariance, such as the cultural relativity of certain
terms.
Finally, the example above only addresses metric invariance. Obviously, a failure to
support configural invariance in the first wave of analyses may also stimulate the
development of new measures in the fictitious Culture Y. Applying both the old and the
newly developed measures in a subsequent study on both cultures would continue to
not support configural invariance (making the test for metric invariance unnecessary).
However, what may be observed is differential prediction. That is, assume the con-
struct is a multi-item measure of employee intention to quit and is used to predict turn-
over behavior. In the test of that simple structural model, a researcher should observe
similar predictive (i.e., structural, another ME/I test) coefficients across both groups
when using the turnover intention measure for that group to predict its own turnover
behavior. In contrast, the parameter estimates should be quite small when using the
intention measure from the other culture (e.g., using Culture Xs responses to the Cul-
ture Y turnover intention measure to predict Culture Xs turnover behavior). The
premise underlying the last statement is that the turnover intention conceptual frame of
reference is not evoked for that culture through the instrument. An interesting variant
to this study would be to take, for example, the items defining Culture Ys frame of ref-
erence for a construct and asking members of Culture X to identify what those items
mean for them. Doing so would simply add face validity to the statistical evidence
indicating why configural invariance was not supported. The statements from Culture
Xs members would indicate that either the items did not have a common frame of ref-
erence or the items may belong to many different frames of reference from their
viewpoint.
Summary
At the risk of stating the obvious, addressing the issues in this section are important
to further establishing the validity of the ME/I tests and to extending their role beyond
the current one of being primarily a pretest tool to also being a hypothesis testing tool.
Most researchers have been satisfied with proxy variables as representations of the
possible trigger events. Littles (2000) critique of the Cheung and Rensvold (2000)
study focuses on this concern and the possibility that Cheung and Rensvold relied too
much on a proxy of the trigger event without actually operationalizing the trigger.
However, even the authors own work is symptomatic of this concern. For example,
Riordan and Vandenberg (1994) assumed that just because a person was a Korean
national, he or she would automatically prescribe to a collectivistic value system. The
same expectation was made of U.S. nationals, but in their case prescribing to an indi-
vidualistic value system. Failures to support invariance were attributed to those cul-
tural differences. The main point here is that the authors did not directly operationalize
the degree of prescription to collectivistic and individualistic value systems and,
hence, did not directly assess the association between the purported trigger (i.e.,
degree of prescription) and not supporting invariance. If the operationalizations had
been included, the validity of the ME/I procedures would have been strengthened
because the findings would have shown a direct link between the purported trigger (the
prescription to the value systems) and the invariance test.
Conclusions
The ideas presented above are by no means exhaustive of the researchable issues
characterizing the ME/I arena. For example, much of what is presented in the very last
section is crossing into item response theory (IRT) and, in particular, differential item
functioning (DIF). This realization, in turn, begs for comparative research examining
the relative merits of the ME/I procedures and other procedures purportedly examin-
ing item behavior. Chan (2000) presented one such example of research in this vein. He
extended the ME/I approach into the detection of uniform and nonuniform DIF. Simi-
larly, Scandura, Williams, and Hamilton (2001) used both the ME/I approach pre-
sented here and traditional IRT analyses to examine the behavior of an organizational
politics scale. Also, as a point of contrast to using only ME/I to examine cultural differ-
ences, Zhou, Schriesheim, and Beck (2001) presented the same issues but from an IRT
perspective. It is beyond the scope of this article to attempt an overview of this
research, as it would mean first presenting an introduction to IRT. The cited articles,
however, provide excellent starting points for those interested in this topic.
In closing, the primary intent of the current article is to stimulate thinking on these
issues and, by doing so, encourage continuing research addressing both the validity
and applicability of the ME/I procedures. A second goal is to educate others that there
are reasons to carefully monitor the literaturea concern recently expressed as well
by Little (2000). There are serious implications (e.g., throwing data out) for not sup-
porting ME/I when support is desired. It would be unfortunate to take such a serious
step when it is not warranted. Furthermore, it would be equally unfortunate to con-
clude that some organizational change process was successful when indeed it was not.
Among the issues presented in this article, the most important in the sense of address-
ing it first is the sensitivity issue. It resides at the core of the proceduresvalidity in that
it would address directly whether the procedures are detecting what researchers
believe they should be. If they are not valid, then the other research questions are moot.
However, that is one persons opinion. One final point of this article is that much of the
content presented here came through informal exchanges with many different col-
leagues from literally different corners of the globe. For me at least, these types of
exchanges are a tremendous source of motivation, and the author is grateful to those
individuals for their willingness to engage in these conversations.
References
Bollen, K. A. (1989). Structural equations with latent variables. New York: John Wiley.
Byrne, B. M., Shavelson, R. J., & Muthn, B. (1989). Testing for the equivalence of factor
covariance and mean structures: The issue of partial measurement invariance. Psychologi-
cal Bulletin, 105, 456-466.
Chan, D. (1997). Racial subgroup differences in predictive validity perceptions on personality
and cognitive ability tests. Journal of Applied Psychology, 82, 311-320.
Chan, D. (2000). Detection of differential item functioning on the Kirton Adaptation-Innovation
Inventory using multiple-group mean and covariance structure analyses. Multivariate Be-
havior Research, 35, 169-200.
Cheung, G. W., & Rensvold, R. B. (1999). Testing factorial invariance across groups: A
reconceptualization and proposed new method. Journal of Management, 25, 1-27.
Cheung, G. W., & Rensvold, R. B. (2000). Assessing extreme and acquiescence response sets in
cross-cultural research using structural equation modeling. Journal of Cross-Cultural Psy-
chology, 31, 187-212.
Gorsuch, R. L. (1990). Common factor analysis versus component analysis: Some well and little
known facts. Multivariate Behavioral Research, 25, 33-39.
Gulliksen, H., & Wilks, S. S. (1950). Regression tests for several samples. Psychometrika, 15,
91-114.
Horn, J. L., & McArdle, J. J. (1992). A practical and theoretical guide to measurement
invariance in aging research. Experimental Aging Research, 18, 117-144.
Hurley, A. E., Scandura, T. A., Schriesheim, C. A., Brannick, M. T., Seers, A., Vandenberg, R. J.,
& Williams, L. J. (1997). Exploratory and confirmatory factor analysis: Guidelines, issues,
and alternatives. Journal of Organizational Behavior, 18, 667-683.
Jreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analy-
sis. Psychometrika, 34, 183-202.
Lawler, E. E., III. (1992). The ultimate advantage: Creating the high involvement organization.
San Francisco: Jossey-Bass.
Lawler, E. E., III. (1996). From the ground up: Six principles for building the new logic corpora-
tion. San Francisco: Jossey-Bass.
Little, T. D. (2000). On the comparability of constructs in cross-cultural research: A critique of

Cheung and Rensvold. Journal of Cross-Cultural Psychology, 31, 213-219.
Marsh, H. W. (1993). The multidimensional structure of academic self-concept: Invariance over
gender and age. American Educational Research Journal, 30, 841-860.
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance.
Psychometrika, 58, 525-543.
Rensvold, R. B., & Cheung, G. W. (1998). Testing measurement models for factorial invariance:
A systematic approach. Educational and Psychological Measurement, 58, 1017-1034.
Rensvold, R. B., & Cheung, G. W. (2001). Testing for metric invariance using structural equa-
tion models: Solving the standardization problem. In C. A. Schriesheim & L. L. Neider
(Eds.), Research in management: Vol. 1. Equivalence in measurement (pp. 21-50). Green-
wich, CT: Information Age.
Riordan, C. M., Richardson, H. A., Schaffer, B. S., & Vandenberg, R. J. (2001). Alpha, beta, and
gamma change: A review of past research with recommendations for new directions. In C. A.
Schriesheim & L. L. Neider (Eds.), Research in management: Vol. 1. Equivalence in mea-
surement (pp. 51-98). Greenwich, CT: Information Age.
Riordan, C. M., & Vandenberg, R. J. (1994). A central question in cross-cultural research: Do
employees of different cultures interpret work-related measures in an equivalent manner?
Journal of Management, 20, 643-671.
Scandura, T. A., Williams, E. A., & Hamilton, B. A. (2001). Measuring invariance using confir-
matory factor analysis and item response theory: Perceptions of organizational politics in
the United States and the Middle East. In C. A. Schriesheim & L. L. Neider (Eds.), Research
in management: Vol. 1. Equivalence in measurement (pp. 99-130). Greenwich, CT: Infor-
mation Age.
Schmitt, N. (1982). The use of analysis of covariance structures to assess beta and gamma
change. Multivariate Behavioral Research, 17, 343-358.
Schaubroeck, J., & Green, S. G. (1989). Confirmatory factor analytic procedures for assessing
change during organizational entry. Journal of Applied Psychology, 74, 892-900.
Snook, S. C., & Gorsuch, R. L. (1989). Principal component analysis versus common factor
analysis: A Monte Carlo study. Psychological Bulletin, 106, 148-154.
Stacy, A. W., MacKinnon, D. P., & Pentz, M. A. (1993). Generality and specificity in health be-
havior: Application to warning-label and social influence expectancies. Journal of Applied
Psychology, 78, 611-627.
Steenkamp, J.E.M., & Baumgartner, H. (1998). Assessing measurement invariance in cross-
national consumer research. Journal of Consumer Research, 25, 78-90.
van de Vijver, F.J.R., & Leung, K. (1997). Methods and data analysis for cross-cultural re-
search. London: Sage.
Van Dyne, L., & LePine, J. A. (1998). Helping and voice extra-role behaviors: Evidence of con-
struct and predictive validity. Academy of Management Journal, 41, 108-119.
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance
literature: Suggestions, practices, and recommendations for organizational research. Orga-
nizational Research Methods, 2, 4-69.
Vandenberg, R. J., Richardson, H., & Eastman, L. (1999). High involvement organizations:
Their antecedents and consequences. Group & Organization Management, 24, 300-339.
Vandenberg, R. J., & Scarpello, V. (1990). The matching model: An examination of the pro-
cesses underlying realistic job previews. Journal of Applied Psychology, 75, 60-67.
Vandenberg, R. J., & Self, R. M. (1993). Assessing newcomerschanging commitment to the or-
ganization during the first 6 months of work. Journal of Applied Psychology, 78, 557-568.
Williams, L. J. (2001). Method variance and measurement invariance. Presentation given at the
annual conference of the Southern Management Association, New Orleans.
Zhou, X., Schriesheim, C. A., & Beck, W. (2001). The importance of measurement equivalence
in transnational research: A test of individual-level predictions about culture and the differ-
ential use of organizational influence tactics, with and without measurement equivalence.
In C. A. Schriesheim & L. L. Neider (Eds.), Research in management: Vol. 1. Equivalence
in measurement (pp. 51-98). Greenwich, CT: Information Age.
Robert J. Vandenberg received a Ph.D. in social psychology from the University of Georgia. He is a full pro-
fessor in the Department of Management of the Terry College of Business at the University of Georgia. He is
currently past division chair of the Research Methods Division of the Academy of Management. He has
served on the editorial boards of the Journal of Applied Psychology, Journal of Management, Organiza-
tional Behavior and Human Decision Processes, and Organizational Research Methods. He is currently an
associate editor for Organizational Research Methods. His research interests include research methods,
high-involvement work processes, and employee work adjustment processes.

Toward A Further Understanding of and Improvement in Measurement Invariance PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Toward A Further Understanding of and Improvement in Measurement Invariance PDF

Uploaded by

Copyright:

Available Formats

ORGANIZATIONAL

Toward a Further Understanding of and

Recognizing that measurement invariance was rarely tested for in organizational

individuals from different cultures interpret and respond to a given measurement

Based on an extensive review of the studies employing ME/I test procedures,

cate different groups.

quent tests to be meaningful.

in factor covariances were interpreted as reflecting differences in conceptual associa-

Sensitivity and Susceptibility

Full Configural No Partial No Groups Not

Full Scalar Invariance No Partial No Groups Not

Option A Option B Option C

Homogeneous Partial Invariance

Figure 1: Flowchart of the Recommended Sequence for Conducting Tests of Measure-

Another illustration of the susceptibility issue was provided by Williams (2001),

So as not to create an impression of idleness, research has either started or is in the

Partial Metric Invariance

Referent Indicator Identification

Mechanisms Through Which Control Is Achieved

components analysis assumes no measurement error (Gorsuch, 1990; Snook &

Triggers, Antecedents, and Causes

trigger or cause a shift in respondents conceptual frames of reference, a recalibration

The description presented in the paragraph above obviously dictates in many

Little, T. D. (2000). On the comparability of constructs in cross-cultural research: A critique of

You might also like