Professional Documents
Culture Documents
By
Seth M. Spain,
University of Illinois, Urbana-Champaign
Andrew G. Miner,
Target Corp.
Pieter M. Kroonenberg,
Leiden University
Fritz Drasgow,
University of Illinois, Urbana-Champaign
Author Notes.
Portions of this research were presented at the 2007 Annual Meeting of the Academy of Management,
Philadelphia, PA. This manuscript is based on the first author’s master’s thesis.
Correspondence and requests for reprints should be addressed to Seth Spain, Department of Psychology,
The authors would like to thank Chuck Hulin, Sungjin Hong, and Theresa Glomb for their assistance and
Questions about the dynamic processes that drive behavior at work have been the focus of increasing
attention in recent years. Models describing behavior at work and research on momentary behavior indicate that
substantial variation exists within individuals. This paper examines the rationale behind this body of work and
explores a method of analyzing momentary work behavior using experience sampling methods. The paper also
examines a previously unused set of methods for analyzing data produced by experience sampling. These methods
are known collectively as multiway component analysis. Two archetypal techniques of multimode factor analysis,
the Parafac and the Tucker3 models, are used to analyze data from Miner, Glomb, & Hulin’s (in press) experience
sampling study of work behavior. The efficacy of these techniques for analyzing experience sampling data is
Keywords: Three-way principal components analysis, Parafac model, Tucker3 model, experience sampling, dynamic
criteria
Job Performance as Multivariate Dynamic Criteria: Experience Sampling and Multiway Component Analysis
Historically, measures of job performance have been of great interest for organizational scientists
attempting to establish the validity of selection systems (Austin & Villanova, 1992). For instance, if a city uses a
cognitive ability measure to select their police officers, applied psychologists want to validate those cognitive ability
scores against a criterion such as the new officers’ arrest records, where arrest records serve as a measure of job
performance. Validity for predicting a criterion is commonly evaluated via the correlation between the predictor
measure and criterion. For the police example, this would be the correlation between the cognitive ability scores and
Of central interest to applied psychologists is how to define job performance, both conceptually and
operationally (e.g., Austin & Villanova, 1992; Borman, 1991). Most validation research treats job performance as a
monolithic and static construct. There is considerable empirical evidence that job performance is multidimensional
(e.g., Campbell, 1991; Campbell, 1994; Campbell, McHenry, & Wise, 1990; cf., Borman & Motowidlo, 1997).
Furthermore, it is possible that job performance is not stable over time (Hulin, Henry, & Noon, 1990; Keil &
Cortina, 2001). In fact, job performance data can usually be classified by three modes: the individuals assessed, the
variables measured, and the times of measurement (e.g., Cattell, 1952; cf., Smith, 1976). Ghiselli (1956) postulated
three systematic sources of variance in job performance data, i.e. all three modes show multidimensionality. Dalal
and Hulin (2008) have called for job performance studies which include changes over time, referring to this
approach as multivariate dynamic. Even this perspective ignores qualitative differences in performance and thus
potential dimensionality in the individuals mode. This paper will deal with multidimensionality of job performance
in all its modes and takes an individual differences perspective to multivariate dynamic aspects of job performance.
In order to validly measure the frequency and the patterning of mental processes in everyday-life situations
procedures are needed which capture variations in self-reports of those processes. To this end, experience sampling
methodology has been developed in which a participant at random or specific times has to report on his or her
mental state or those activities in which he or she is involved at that moment. To capture those reporting instances
participants are supplied with beepers, or more recently with palmtop computers. With the help of these devices,
several brief surveys each day are administered to participants (Larson & Csikszentmihalyi, 1983; Csikszentmihalyi
& Larson, 1987). As experience sampling allows information to be gathered from individuals about several variables
over time, the procedure provides data to study Ghiselli’s three sources of job performance variance. A number of
methods for analyzing such experience sampling data have been used in the literature, in particular spectral analysis
and multilevel modeling. This paper will demonstrate techniques that have not been frequently used in the
organizational literature, and explore their usefulness for understanding experience sampling data.
The techniques presented in this paper are methods of multiway component analysis (see Kroonenberg,
2008, p.16 for the distinction between mode and way). We focus on the Tucker3 model (Tucker, 1966) and the
Parallel factor analysis (Parafac) model (Harshman & Lundy, 1984a;b). These three-mode models can be used to
explore the individual, static, and dynamic structures of work experience data. The results show multidimensionality
in all three modes. The meaning of these dimensions and implications for understanding job performance are
discussed.
There are three sources of meaningful variance in job performance scores (Ghiselli, 1956): (1) static
dimensionality, the ordinary factorial representation of performance; (2) dynamic dimensionality, temporal factors
influencing the performance domain; and (3) individual dimensionality, variability in the type of performance across
persons in the same job. We will consider each of these sources in turn, beginning with individual dimensionality.
For similar distinctions in a more general sense, see Cattell (1952; refer to Figure 1 below).
Ghiselli described individual dimensionality as the way that individuals assigned to the same job within an
organization perform that specific job in qualitatively different ways, not merely with different proficiency.
Dimensions in the individuals-mode differentiate types of employees. For example, two salespersons may provide
the same economic benefit to the organization, but one contributes by directly making sales, while the other
contributes by creating goodwill, encouraging customers to make purchases throughout the store (Ghiselli, 1956, p.
4). Such individual difference dimensions of performance could be important in a variety of situations. If the
organization derives the same economic benefit from different employees in different ways, these differences should
Static dimensionality refers to the latent structure of the variables measuring job performance. Historically,
the study of job performance was characterized by a search for the “ultimate criterion”, a comprehensive index of
performance. It has been pointed out that this is an inappropriate way to conceptualize performance (e.g., Ghiselli,
1956; Dunnette, 1963). There is evidence across many jobs that overall job performance can consist of as many as
eight dimensions (Campbell, 1994; Campbell, McHenry, & Wise, 1990). At minimum, job performance researchers
should consider both task performance, the technical core of the job, and contextual performance, the social and
non-technical contributions an individual makes at work (Borman & Motowidlo, 1997). These dimensions have
been found to independently contribute to supervisors’ perceptions of their subordinates’ overall job performance
(Motowidlo & Van Scotter, 1994). For instance, consider two salespeople who have equal sales. One is known to be
a loner, while the other gives advice and assistance to coworkers. The latter will be viewed as the superior
It may be necessary to have a single measure of job performance, for example when establishing the
predictive validity of a selection battery (Hulin, 1982). One might then construct some composite of scores on
various performance dimensions. The weighting scheme chosen to combine different dimensions of job performance
has been found to greatly impact those validity estimates in Monte Carlo simulations (Murphy & Shiarella, 1997).
These authors found that 34% of the variance in validity estimates over simulated samples was accounted for by the
Finally, the dynamic nature of performance criteria is important to consider for employee selection. There
are well-documented changes in the predictive relationships between selection instruments and performance
measures (Alvarez & Hulin, 1972). In an academic example, SAT scores are most predictive of first semester
college grade point average, and the correlations between SAT and GPA decrease after that (e.g., Humphreys,
1976). These changes in predictive validity have been found to be virtually ubiquitous (Hulin et al., 1990). The fact
that predictive validity changes over time is usually called validity degradation and is the primary approach to the
study of dynamic criteria, though there are other approaches (e.g., Austin, Humphreys, & Hulin, 1989; Barrett,
Caldwell, & Alexander, 1985). This work is summarized in meta-analyses by Hulin et al. (1990) and Keil and
Cortina (2001), which show that validity degradation is pervasive and occurs with all manner of ability predictors.
If performance changes over time, it would useful to find predictors of the change itself. Personality
measures have been used in addition to cognitive ability measures to predict individual growth curves for
performance criteria (e.g., Zyphur, Bradley, Landis, & Thoresen, 2008). The results of this study indicate that both
cognitive ability and conscientiousness predict initial academic performance, but only conscientiousness predicts
performance trajectories. This may happen because early performance is a transition phase of skill acquisition and
later performance is a maintenance phase (Murphy, 1989). In a study testing that proposition, both conscientiousness
and extraversion predicted maintenance performance differences between participants, but that only
conscientiousness predicted maintenance performance growth and that agreeableness and openness to experience
predicted both performance differences and trends in the transition phase (Thoresen, Bradley, Bliese, & Thoresen
2004). These attempts remain univariate in their approach to criterion measurement, however. Theoretical models
should guide the sampling of variables as research progresses into a multivariate dynamic framework.
One such model is provided by Affective Events Theory (Weiss & Cropanzano, 1996). These authors
suggest that workplace behavior comes in two basic kinds: affect-driven and judgment driven. Workplace events
cause affective reactions, and these affective reactions directly influence affect-driven behavior. But these affective
reactions also influence job attitudes that in turn directly influence judgment-driven behaviors. It is hypothesized
that momentary affect should thus show stronger relationships with momentary behaviors such as work withdrawal
(e.g., taking long coffee breaks or surfing the web) and that job attitudes should have stronger relationships with
more considered behaviors such as job withdrawal (e.g., job search, turnover intentions, quitting). Furthermore, it is
expected that individual differences in personality will moderate both the link between events and predict the
In complement to Affective Events theory, the Episodic Process Model (Beal, Weiss, Barros, &
MacDermid, 2005) suggests that there will be important momentary fluctuations in the affective and regulatory
resources available for employees to apply to performance behaviors. This model articulates reasons why
performance behaviors should meaningfully vary within persons over short time periods. For example, if my
supervisor yells at me, and I then need to interact with a client, I may have to regulate my emotional display to
appear positive to the client. This act of emotion regulation uses up some of my regulatory resources, and may
therefore make it more difficult for me to focus my attention on a report I need to write later in the day. Obtaining
evidence to test such a model requires research designs that are capable of untangling both within- and between-
individual variability.
Experience sampling methods are ideally suited to explore dynamic models of work behavior because
measurements may be taken throughout the work day on several variables. For example, Weiss, Nicholas, and Daus
(1999) tested propositions drawn from Affect Events Theory about the influence of job beliefs and moods on overall
job satisfaction. Dalal, Lam, Weiss, Welch, & Hulin (2009) investigated the dimensionality of organizational
citizenship and counterproductive behaviors, and their relationships to state affect and performance. Miner, Glomb,
and Hulin (2005; in press) found general support that individuals modify their work behaviors in response to mood
changes. These studies demonstrate that a substantial portion of the variance in organizationally-relevant variables is
Experience sampling data are three-mode (persons × variables × occasions) and are frequently analyzed
with multilevel models, with the occasions mode nested within persons. This paper addresses whether such data may
be examined using multiway analysis, as illustrated in the next section. Situations where these analyses would be
useful are when the research questions run along these lines: do the structures of counterproductive and citizenship
behaviors change over time? or, even more fitting for multiway methods, do different groups of employees display
different patterns of change in these structures? Multiway models are a natural way of addressing such questions,
but may prove difficult to implement with experience sampling data. For instance, in the data presented here, the
measurement occasions occurred randomly within two-hour windows. Therefore, Participant 1’s time 1 is not
strictly simultaneous with Participant 2’s, but should be basically equivalent. One goal of this paper is to serve as a
methodological exercise in an attempt to judge whether this assumption of near-simultaneity is tenable. This concern
Multiway Analyses
When viewed in the way specified by Ghiselli (1956), the performance domain is defined by a three-way
data relation box (Cattell, 1952). This box is defined by its three modes: individuals, variables, and occasions. This
box is referred to as an array, a generalization of the matrix concept (see Figure 1). In Figure 1, subjects, variables,
and occasions are all both modes and ways. The front side of the array in Figure 1, the subjects by variables “slice”
is the matrix we usually handle in psychological data. Each measurement occasion in a longitudinal/ experience
sampling study would have such a matrix. Stacking these slices as in Figure 1 leads to a three-mode array. We next
consider two broad classes of multiway component analysis that will be considered in this paper, the Parafac model
and the Tucker3 model. These are both components analysis models, however both were initially referred to as
factor analysis (e.g., Tucker, 1966). In this paper, we refer to components, except when discussing the historical
Cattell (1944) presented a solution to the problem of rotational freedom in standard factor analysis, which
he called the principle of parallel proportional profiles. Cattell’s idea was to measure the same individuals on the
same variables across two systematically different occasions, say before and after an experimental manipulation. If
one would recover the same factors at both occasions then this common orientation of the factors or these parallel
proportion profiles could according to Cattell only occur if the factors represent real psychological constructs.
The Parafac model (Harshman, 1970) is essentially both a model formulation and computational solution of
this principle applied to an arbitrary number of conditions or occasions. Moreover, Harshman (1970) and Carroll &
Chang (1970) showed that the parallel proportional profile principle could be extended to the n-way case. The
Parafac model is mathematically identical to the canonical decomposition model in multidimensional scaling
(Carroll & Chang, 1970). In this paper, we will refer only to the former name. The Parafac model is presented in
where Xk is the kth frontal slice of the data array X (matrices are boldfaced, and multi-way arrays boldfaced and
underlined). A is the coefficient matrix for the first mode, B' is the transpose of the coefficient matrix of the second
mode, Dk is a diagonal matrix containing the kth row of the third mode coefficient matrix along its diagonal, i.e. dssk
= cks. Ek is the kth frontal slice of residual array E. In sum notation this becomes
which shows that for the kth slice of the data array the dssk-coefficients provide weights for the coefficients of the
other two modes. They are the “proportionality” constants that capture the systematic variation between slices of the
data array (cf. Harshman & Lundy, 1984a). A dssk (csk) coefficient defines the relative importance of component s to
occasion k. It should be noted that the model is in fact proportional with respect to all its modes.
It is important to distinguish two types of variation: system variation pertaining to the system as a whole
and object variation which pertains to variation of specific (groups of) objects, i.e. part of the system but not the
system as a whole. For the Parafac model to be applicable sufficient system variation should be present in the
multiway data. System variation involves a conceptualization of components that lie “in the system” under
examination such that the same instances of the components affect all of the objects; there should be parallel
proportional profiles. The object variation model involves a conceptualization of components as having an instance
of the component “in” each object under study, such that the components are properties of (groups of) individuals
under study. The system model implies synchronous variation in component influences across levels of the third
mode (e.g., occasions), while the object model does not imply such synchronous variation (cf. Harshman & Lundy,
1984a, pp. 130-133). The basic Parafac model has no provision for modeling object variation. Therefore, if
substantial object variation is present, a more complicated model such as the Tucker3 should be used.
Tucker (1964, 1966) formally introduced the three-mode factor analytic model for three-way data as an
extension to the well-known two-mode model for two-way data. This model was later named after him by
Kroonenberg and de Leeuw (1980). The Tucker family of models can account for both system and object variability.
Of these models the Tucker3 is the most complex and it can also be shown to be more complex than the Parafac
Here, aip represents the coefficient of the ith individual on the pth A-mode factor, while. bjq represents the
coefficient of the jth variable on the qth variable factor, and the ckr represents the coefficient of the kth time point on
the rth time factor. Finally gpqr represent the element of the core array which links the pth, qth, and rth factors in the A-
, B-, and C-modes, respectively, with each other. It can also be interpreted as a representation of the interaction
between the factors of the three modes (for further details see Kroonenberg, 2008, Section 4.8).
Thus the columns of the coefficient matrices A, B, and C are the components which may be represented as
vectors in graphs displaying the component spaces. For convenience, we often speak of component scores for
subjects, component loadings for variables and component scores or weights for time points. Only the variable
loadings can however be understood as in the two-way case, i.e. as variable-component correlations, if the variables
have been centered and normalized in a specific way (see below). In most cases, the interpretation of the component
scores proceeds in the same way as component scores in two-way component analyses where the components are in
normalized coordinates, i.e. have unit sums of squares. Another way of thinking about the component matrices is in
terms of “idealized” instances of the real objects in that mode; i.e. ideal individuals, ideal (or latent) variables, and
ideal measurement occasions or trends (Tucker, 1966; Kroonenberg, 2008, Section 9.4).
The core array is what allows Tucker models to account for object variability where the Parafac model
cannot. So while Component 1 in the individuals-mode must correspond to Component 1 in the variables and time
modes in a Parafac analysis, the same does not apply to a Tucker3 model. The core allows three-way interactions
between the components in different modes and for different numbers of components in each mode. A number of
ways of interpreting the core are possible. One may view the core array as “idealized data”, such that each element
represents the score on the ideal variable by the ideal person at the ideal measurement occasion corresponding to
that core element (cf. Tucker, 1966). As the scalar notation makes clear (Equation 3), one can also think of the core
elements as regression weights for predicting the original data using the three-way interaction of the components
(Kroonenberg, 2008, p. 225-228). Thus, core elements give the importance of a combination of mode components
Research Questions
Inn, Hulin, and Tucker (1972) offer perhaps the only multivariate dynamic criterion study using multiway
analysis. They fit a Tucker3 model to data from a sample of 184 airline reservation agents measured five times on 11
performance variables. These authors found a 4×3×3 solution, demonstrating multidimensionality in each mode,
consistent with Ghiselli’s three sources conjecture (1956). Given the dearth of previous research, this study is
largely exploratory and we propose few a priori hypotheses. The major expectation of this study is that multiple
components will be extracted in each mode of the data, consistent with Ghiselli (1956) and Inn et al. (1972). Given
previous work on affect cycles in experience sampling designs, it is likely that at least one of the time components
will show cyclic trends in loadings will be found (Weiss et al., 1999). These loadings are analogically similar to
variable loadings on growth parameters in latent growth curve analysis (e.g., Chan, 1998), though they are both
exploratory in character and expected to display cyclical rather than curvilinear patterns. Further, we expect that we
should be able to recover three B mode components for the objective, behavioral, and self-rated performance
variables discussed in the methods section below. We have no hypotheses regarding individuals-mode components
This paper is primarily a methodological exploration. We want to know if it is possible to fit well-behaved
multiway models to experience sampling data. We expect to be able to do so, and the only remaining question is
whether the Parafac model will be sufficient. The Parafac model is a constrained Tucker3 model, which itself can be
seen a constrained two-way principal components analysis (PCA) of the wide combination-mode matricization (e.g.,
I×JK) of the data array X. Therefore, the sums of squares for each model (SSmod) must be:
provided all models have equivalent numbers of components (Kiers, 1991). Therefore, Tucker3 and PCA models
will provide similar though better fits to the data than an equivalent Parafac model. However, if the Parafac model is
appropriate, the additional parameters of these models will either model systematic variation redundantly or simply
model noise (Smilde, Bro, & Geladi, 2004, pp. 154 - 155). If a degenerate Parafac solution is obtained, it usually
METHOD
The experience sampling data analyzed here were originally collected by Miner et al. (in press), who
sampled one-hundred individuals from a pool of about 300 incumbents at a Fortune 500 technological sector call
center. These individuals served in either customer service or technical support capacities. Sixty-seven of the
invitees actually participated. Of these, 55 participants had sufficient criterion data for analyses. Those 55
participants were mostly white (94 %) and non-students (85 %). Fifty-one percent were male, 78 % had at least
some college education, and 66 % worked 9 hour shifts. The mean age of participants was 34 years (sd = 11 years)
and their mean tenure with the organization was 2.4 years (sd = 2.3 years).
Participants’ palmtop computers signaled them by beeping either four or five times a day, depending on
whether they work a five-day, 9 hour shift or a four-day, 11 hour shift, respectively. Participants completed surveys
each workday for three weeks, resulting in a total of sixty possible measurement opportunities for each respondent.
Participants were excluded from analyses if they answered one-half (30) or fewer of the surveys. Participants tended
to miss most surveys late in the study, so we used the first 40 measurement occasions.
Experience sampling measures. Participants answered questions about whether they solved the customer’s
problem, made three ratings of their own performance, and responded to eight items sampling behaviors related to
task performance, organizational citizenship behavior, and work withdrawal. Participants were also asked if they
were engaged in several work behaviors, which were scored dichotomously. The behavioral items were aggregated
into unit weighted composites for focal performance (solving the problem, doing work tasks), being on the job (at
workstation, away but on the job), negative work behaviors (withdrawal, doing personal tasks), positive citizenship
behaviors (helping, doing organizational citizenship), and neutral behaviors (at lunch, on break). In addition to the
self-reported measures, performance was also indicated by objective measures of average call handle time, average
call wait time, average call hold time in the thirty minute window in which the signal occurred.
measures in a post-study survey. These measures can be used to explore the meaning of the individuals-mode
components. These measures were the Trait Meta-Mood Scale (Salovey, Mayer, Goldman, Turvey & Palfai, 1995)
and theInternational Personality Item Pool Extraversion and Neuroticism (IPIP; Goldberg, 2001). These measures
are linked to dispositional affectivity, which is predicted by Affective Events Theory to moderate an individual’s
reactions to affective events as well as to directly influence affective reactions to events. Trait meta-mood is a form
of emotional intelligence, and related to an individual’s ability to recognize and regulate their feelings. IPIP
extraversion is primarily an index of sociability and gregariousness, while neuroticism tends to focus on
anxiety/stress reactance. Participants also completed the Job Descriptive Index, a measure of job satisfaction (Smith,
Kendall, & Hulin, 1969). Affective Events Theory predicts that job satisfaction should be more related to judgment-
driven behaviors than to impulsive behaviors. Additionally, job satisfaction has an empirical association with job
performance, so it is reasonable that some individuals-mode components may be associated with job satisfaction.
Finally, self-reports of average work and job withdrawal and citizenship performance were assessed as well.
Participants tended to miss many surveys in the last week, resulting in substantial missing data after the
fortieth measurement occasion. Therefore, the data we analyzed consisted of 55 participants by 11 variables by 40
measurement occasions (24200 data points). About 17295 data points contained valid entries, and about 6905 did
not (28.5% missing). Missing entries were handled by imputing model-implied values. Missing values were
initialized with two-way MANOVA estimates. The information in a three-way array is quite rich, and with an array
of sufficient size, a substantial portion of missing values can be handled relatively reliably but often requires special
care in defining starting values for the missing-data locations (Smilde, Bro, & Geladi, 2004). Though this dataset is
not large enough to allow for cross-validation, the stability of the results were examined using several random starts.
RESULTS
Both Parafac and Tucker3 models were fit to the data. Analyses were performed using the 3WayPack suite
of programs (Kroonenberg, 1994; 2004). Multiway models are compared based on variance accounted for (VAF)
measures. The VAF of the model was divided by number of factors estimated, in order to compare the model’s
explanatory power to its parsimony (Timmerman & Kiers, 2000). Similar indices were discussed in Kroonenberg
and Oort (2003; Kroonenberg, 2008, pp. 179ff.). These are not formal fit indices, but aid the researcher’s decision-
Component models assume that variables are ratio-scaled ones (Harshman & Lundy, 1984a). However,
psychological measurements mostly have interval properties. Centering will change the data into ratio-scale
deviations from the mean so that they can be handled in a component analysis. Harshman and Lundy (1984b) and
Kroonenberg (2008, Chapter 6) discuss appropriate types of centering for three-way data, and centering across
subjects per variable-time point combination is their recommended procedure for the present data.
Our data were accordingly centered across the subject mode, such that the mean for each variable at each
measurement occasions was subtracted out of each measurement. This is the most appropriate centering strategy
because the starting point of the study is arbitrary. An accompanying advantage of this centering is that the matrix of
variable-occasion means can be seen as the score of the average person so that the deviation scores can be
interpreted with respect to this average person. Additionally, the three objective measures (call handle time, call wait
time, and call hold time) were corrected for non-normality by taking their natural logarithms. Finally, the variables
were size-normalized over all subjects-time point combinations so that the values of the variables were comparable
as standard scores across variables and remained comparable per variable across all occasions. To be precise, the
that given proper scaling of the output (see below) the component loadings for the variables can be interpreted as
Descriptive statistics
Presenting even descriptive statistics for the entire data array would be cumbersome. Thus descriptive
statistics for the A-mode matricization are presented in Table 1. Matricization is the unfolding of the three-mode
data array into a two-way data matrix, extended along one of the modes of data classification. Given an i×j×k data
array, the A-mode matricization is of order ik×j (cf. Kiers, 2000), where, the subject mode (k) is the slower-running
mode such that the rows of the matrix consist of the first participant’s 40 measurement occasions followed by the
second participant’s 40 measurement occasions, and so on. Table 1 presents the means, standard deviations, and
correlations for the variables in the A-mode matricization. This correlation matrix can be interpreted as usual,
Means Analysis
Prior to three-way analysis, we explored the mean time trends for the variables, i.e. trends for means which
were removed before the three-way analyses. Figures 2 through 4 present the time trends for seemingly similar
variables. Figure 2 displays the self-rated variables: average handle time, average quality of service, and overall
service. These variables display very similar time trends. Figure 3 shows the trends for the behavioral variables: the
focal performance composite, withdrawal behaviors, citizenship behaviors, neutral behaviors, and location. Again,
these display highly similar trends. The objective measures, average handle time, average wait time, and average
series for these variables averaged over subjects. Two components were recovered: the self-rated variables loaded
strongly on one, with location, focal performance, withdrawal and neutral behaviors loading strongly on the other.
Log-average handle time and citizenship performance also showed reasonable loadings on this component. The call
duration and call wait time variables did not load strongly on either component.
Three-way analysis
Based on several initial solutions, we decided to eliminate one participant from the three-way analyses.
This participant had much too high an influence on the solutions. After this, the preferred models are: 2×2×1,
3×2×2, 4×2×2, and 4×2×4 Tucker models and a two-component Parafac model. The two-component Parafac model
Parafac analyses. We fit a two-component Parafac model using multiple random starts. Three of these
starts produced Tucker’s congruence coefficients greater than |.85|. The components in the two-component solutions
are highly negatively congruent, with an average congruence coefficient of -0.84 and an average core consistency of
-25.9. These results indicate that the higher component Parafac solutions are degenerate. There are several options
for dealing with degenerate solutions. One is to impose orthogonality on the factors in one mode. Another option is
Tucker3 analyses. Figure 5 displays the time components from the 3×2×2 Tucker model solution with loess
curves fitted to them with 70% of the data used for the local regressions (Cleveland & Devlin, 1988). The first
component has been rotated to optimal constancy. The second component is orthogonal to the first one. There is no
clear periodicity (i.e., performance cycles) in these time components. However, to uncover such periodicity would
require that the measurement occasions were spaced evenly and truly taken at the same time for all participants. We
constructed joint biplots to investigate the relationships between individuals and variables for the two time
components (e.g., Kroonenberg, 2008 Appendix B). Variables are displayed as vectors in the two-dimensional
space, and individuals projecting highly on a variable vector scored highly on it; individuals plotted near each other
Figure 6 is the joint biplot of individuals and variables for time component 1 (variable coordinates
presented in Table 4). Since time component 1 is largely constant, this biplot represents individual differences in
how the participants carried out their job duties over time. For instance, we can see how participants 19, 20, 23, 46,
and 50 had high scores on call wait time, but low scores on call duration and handle time. On the other hand,
participants 25, 40, and 48 scored highly on call duration and handle time but low on call wait time. Similarly,
participants 20, 23, and 27 had high scores on citizenship while participants 3 and 9 had very low scores on
citizenship. The objective call time measures and citizenship are the most important variables for this time
component.
Figure 7 displays the biplot for the second time component (variable coordinates are also presented in
Table 4). This component showed much more pronounced fluctuations than component 1 and an increasing trend
towards the end of the study. This suggests that the importance of the relationships displayed in Figure 7 became
more important over the course of the study. This increase of importance is especially due to handling time and
citizenship, but also withdrawal, focal performance, focal location, call duration, and quality of service. Participants
with positive scores, such as 30, show increasing trends over time. Those with negative scores, like 3 and 9, show
decreases.
To aid interpretation of the individual components, we examined the correlations of these components with
individual difference measures (e.g., Van Mechelen & Kiers, 1999). The first component correlated positively with
meta-mood clarity and negatively with overall job withdrawal (r = .30, p = .04 and r = -.32, p = .03, respectively).
The second component correlated with meta-mood repair positively (r =.29, p = .05) and marginally positively with
overall citizenship performance (r =.27, p = .07) and marginally negatively with overall work withdrawal (r = -.25, p
= .10). The third component correlated with overall citizenship performance (r =.38, p = .01) and marginally
negatively with satisfaction with supervisor (-.23, p = .10). The first component has perhaps the clearest
interpretation, as these individuals report great awareness of their moods and high overall job withdrawal: these
employees are likely very in tune with their emotions and unlikely to quit. They are probably having a relatively
positive work experience, relative to the average employee. The second component also has a reasonably clear
interpretation. Given that individuals are able to intentionally repair their moods and report high levels of citizenship
performance and low work withdrawal, these are likely individuals who can deal with negative emotions effectively
and remain engaged with their work. The third component may again reflect some form of engagement. Employees
scoring high on this component report high citizenship, in spite of lower ratings on supervisory satisfaction.
The variables-mode component loadings are displayed in Table 6. The two components cleanly separate the
objective and behavioral measures, and the self-reported measures do not load substantially on either component.
With the objective measures, call duration and handle time have the opposite sign as call wait time. Citizenship
performance has the highest loading for the behavioral measures, with focal performance, location, and withdrawal
all loading between .31 and .34. Interestingly, all of these loadings, even withdrawal, are positive. It is also
interesting that the more direct self-reported measures of performance do not systematically load on these
components.
The Tucker core is shown in Table 7. The combination of the first components in all modes has the largest
core element, and accounts for the most variance. The most substantial elements following this are the combination
of the second subject and the behavioral variables component with each of time components (g221 and g222,
DISCUSSION
The Tucker3 model provides a parsimonious explanation of the observed data, even though the overall
variance explained for all of the models is rather low. Experience sampling data prove difficult to analyze for a
number of reasons. The procedures are intrusive, which can lead to large amounts of missing data when participants
are too busy or simply choose not to answer questions. Furthermore, individuals were studied over a relatively short
time and were measured at points very close together in time. Perhaps if measurement occasions were sampled at
fully equivalent time-points across individuals, or if the study took place over organizationally meaningful epochs,
such as a new product rollout or layoff announcement, the Parafac model may have proven more useful in
This theme bears some greater consideration. Often with longitudinal and experience sampling studies, in
particular, researchers wade into the stream of events with no real care as to when they do so. Put simply, time one
often is not really time one, but an arbitrary starting point for the study. Likewise, the studies often end at an equally
arbitrary point in time. Multiway thinking can help encourage researchers to treat time as an important facet in the
This study attempts to fit a dynamic structural model to data derived from an experience sampling design.
ESM studies often come in one of two types: event-triggered and signal triggered. In event-triggered studies,
participants opt to answer surveys when a particular event or type of event occurs. In signal-triggered studies,
individuals respond to surveys when signaled by their beeper or palmtop computer. When the ESM study is event-
triggered, measurement occasions can only be viewed as nested within individuals. In signal-triggered designs,
signals will be random within time windows. Our results indicate that these measurement occasions may be
This paper demonstrates that dynamic and structural investigations may be conducted with signal-triggered
experience sampling data by using methods such as multiway analysis that. Using tools such as multilevel modeling
is also useful, but might sacrifice information that could be obtained by treating the data in this manner. Had the data
been collected with the intent of examining structure and dynamics, the performance variables would likely have
been more systematically sampled from the repertoire of performance behaviors. This would probably allow for
stronger substantive conclusions regarding the multivariate dynamic criterion space. As it stands, it is very
interesting that the self-reported performance variables did not load substantially on either of the variables-mode
components. Given that the components reflect behavioral activities and objective measures of performance, this
suggests that the self-reported measures may not accurately represent what a worker is actually doing.
There are three major limitations to this study. The first is that the data were not collected for the purpose
of a structural analysis. Therefore, the amount of shared variance among the indicator variables may not adequately
cover the latent job-performance variables. While this does not preclude the analyses done in this paper, data
collected with such analyses in mind would undoubtedly provide cleaner results. Second, there was a substantial
amount of missing data. However only participant 16 appeared to show undue influence on the model solutions, and
this participant was removed from the analyses reported here. Third, the sample involved only two groups of
employees from one organization, limiting the external validity of the results.
Further study of the job performance domain and its three sources of variance should be examined across a
sample from the population of jobs. This requires collecting experience sampling and other longitudinal data in
many organizations with many different types of employees. One possibility is applying these techniques to large
samples of jobs that are appraised on multiple performance indicators at fixed intervals, such as in the military. Such
formal performance evaluations at multiple points in time would address questions about the normative development
of job performance.
However, without examining other studies of performance and its determinants, the underlying dynamics in
studies with longer measurement epochs will never be fully understood (cf. Beal et al., 2005). Data to answer these
sorts of questions should be collected using multiple methods and examined using multiple analytic procedures.
Such a strategy will allow for a scientific understanding of the dynamic interaction of individual and workplace
Conclusion
The major performance theories in applied psychology indicate that the criterion space is a
multidimensional one. However, these theories are static; they are largely agnostic as to the dynamic nature of
performance and its determinants. It is well-established that the relationship between measures of predictors of job
performance and performance appraisal measures degrades over time. This situation suggests the possibility of
dynamism on both sides of the performance prediction model. In order to understand the dynamic relationship
between two variables, we must first understand of the processes underlying each of the variables.
The results of the current study provide further evidence that organizational scientists must attend to
dynamic processes. Further, these results echo the exhortations of Ghiselli (1956) and Dalal and Hulin (2008) to
more broadly consider the criterion space when designing personnel interventions. It is important that these
between- and within-person sources of lawful variance be considered in such endeavors. Finally, continuing to think
of the performance prediction problem in terms of bivariate correlations prevents the development of a truly
scientific understanding of the processes by which attributes of employees interact dynamically with the work
REFERENCES
Alvarez, K.M., & Hulin, C.L. (1972). Two explanations of temporal changes in ability-skill relationships: A
Austin, J.T., Humphreys, L.G., & Hulin, C.L. (1989). Another view of dynamic criteria: A critical reanalysis of
Austin, J. T., & Villanova, P. (1992). The criterion problem: 1917-1992. Journal of Applied Psychology, 77, 70-88.
Barrett, G.V., Caldwell, M.S., & Alexander, R.A. (1985). The concept of dynamic criteria: A critical reanalysis.
Beal, D.J., Weiss, H.M., Barros, E., & MacDermid, S.M. (2005). An episodic process model of affective influences
Borman, W.C. (1991). Job behavior, performance, and effectiveness. In M. Dunnette & L. Hough (Eds.) Handbook
of Industrial/Organizational Psychology, Vol. 2 (pp. 271-326). Palo Alto, CA: Consulting Psychologists
Press.
Borman, W. C., & Motowidlo, S. J. (1997). Task performance and contextual performance: The meaning for
Campbell, J. P. (1991). Modeling the performance prediction problem. In M. Dunnette & L. Hough (Eds.)
Handbook of Industrial/Organizational Psychology, Vol. 2, (pp. 687-732). Palo Alto, CA: Consulting
Psychologists Press.
Campbell, J. P. (1994). Alternative models of job performance and their implications for selection and classification.
In M. G. Rumsey & C. B. Walker (Eds.) Personnel Selection and Classification (pp. 33-51). Hillsdale, NJ:
Erlbaum.
Campbell, J. P., McHenry, J.J., & Wise, L. L. (1990). Modeling job performance in a population of jobs. Personnel
Carroll, J.D., & Chang, J.-J. (1970). Analysis of individual differences in multidimensional scaling via an n-way
Cattell, R. B. (1944). “Parallel proportional profiles” and other principles for determining the choice of factors by
Cattell, R. B (1952). The three basic factor analytic research designs: Their interrelations and derivatives.
Chan, D. (1998). The conceptualization and analysis of change over time: A integrative approach incorporating
longitudinal mean and covariance structures analysis (LMACs) and multiple indicator latent growth model
Cleveland, W.S., & Devlin, S.J. (1988). Locally weighted regression: an approach to regression analysis by local
Csikszentmihalyi, M. & Larson, R. W. (1987). Validity and reliability of the experience sampling method. Journal
Dalal, R. S., & Hulin, C. L. (2008). Motivation in organizations: Criteria and dynamics. In R. Kanfer, G. Chen, and
R. Pritchard (Eds.), Work Motivation: Past, Present, and Future. (pp. 63-100). New York: Erlbaum.
Dalal, R. S., Lam, H., Weiss, H.M., Welch, E.R., & Hulin, C.L. (2009). A within-person approach to work behavior
relationships with affect and overall job performance. Academy of Management Journal, 52, 1051-1066.
Dunnette, M. D. (1963). A note on the criterion. Journal of Applied Psychology, 47, 251-254.
Ghiselli, E. E. (1956). Dimensional problems of criteria. Journal of Applied Psychology, 40, 1-4.
Goldberg, L. R. (2001, July 24). International Personality Item Pool: A scientific collaboratory for the development
http://ipip.ori.org/ipip/ipip.html
Harshman, R. A. (1970). Foundations of the Parafac procedure: Models and conditions for an “explanatory”
multimodal factor analysis. UCLA Working Papers in Linguistics, 16, 1-84. (University Microfilms, Ann
Harshman, R. A., & Lundy, M. E. (1984a). The PARAFAC model for three-way factor analysis and
multidimensional scaling. In H.G. Law, C. W. Snyder, J. A. Hattie, & R. P. McDonald (Eds.) Research
Harshman, R. A., & Lundy, M. E. (1984b). Data preprocessing and the extended PARAFAC model. In H.G. Law,
C. W. Snyder, J. A. Hattie, & R. P. McDonald (Eds.) Research methods for multimode analysis (pp. 216-
Hulin, C. L., Henry, R. A., & Noon, S. L. (1990). Adding a dimension: Time as a factor in the generalizability of
Humphreys, L.G. (1976). The phenomena are ubiquitous – but the investigator must look. Journal of Educational
Inn, A., Hulin, C. L., & Tucker, L. R. (1972). Three sources of criterion variance: Static dimensionality, dynamic
dimensionality, and individual dimensionality. Organizational Behavior and Human Performance, 8, 58-
83.
Keil, C. T., & Cortina, J. M. (2001). Degradation of validity over time: A test and extension of Ackerman’s model.
Kiers, H. A. L. (1991). Hierarchical relations among three-way methods. Psychometrika, 56, 449-470.
Kiers, H. A. L. (2000). Toward a standardized notation and terminology in multiway analysis. Journal of
Kroonenberg, P.M. (1994). The TUCKALS line: A suite of programs for the analysis of three-way data.
Kroonenberg, P.M. (2008). Applied multiway data analysis. Wiley: Hoboken, NJ.
Kroonenberg, P.M., & De Leeuw, J. (1980). Principal component analysis of three-mode data by means of
Kroonenberg, P.M., & Oort, F.J. (2003). Three-mode analysis of multimode covariance matrices. British Journal of
Larson, R., & Csikszentmihalyi, M. (1983). The experience sampling method. H.T. Reis (Ed.) Naturalistic
Miner, A. G., Glomb, T., & Hulin, C.L. (2005). Experience sampling mood and its correlates at work. Journal of
Miner, A. G., Glomb, T., & Hulin, C.L. (In press). Experience sampling events, moods, behaviors, and performance
Motowidlo, S.J., & Van Scotter, J.R. (1994). Evidence that task performance should be distinguished from
Murphy, K.R. (1989). Is the relationship between cognitive ability and job performance stable over time? Human
Performance, 2, 183-200.
Murphy, K.R., & Shiarella, A.H. (1997). Implications of the multidimensional nature of job performance for the
validity of selection tests: Multivariate frameworks for studying test validity. Personnel Psychology, 50,
823-854.
Salovey, P., Mayer, J.D., Goldman, S.L., Turvey, C., & Palfai, T.P. (1995). Emotional attention, clarity, and repair:
Exploring emotional intelligence using the Trait Meta-Mood Scale. In J.W. Pennebaker (Ed.), Emotion,
Smilde, A., Bro, R., & Geladi, P. (2004). Multiway analysis: Applications in the chemical sciences. Wiley:
Hoboken, NJ.
Smith, P. C., Kendall, L., & Hulin, C. L. (1969). The measurement of satisfaction in work and retirement. Chicago:
Rand McNally.
Thoresen, C.J., Bradely, J.C., Bliese, P. D., & Thoresen, J.D. (2004). The big five personality traits and individual
job performance growth trajectories in maintenance and transitional job stages. Journal of Applied
Timmerman, M.E., & Kiers, H.A.L. (2000). Three-mode principal components analysis: Choosing the number of
components and sensitivity to local optima. British Journal of Mathematical and Statistical Psychology, 53,
1-16.
Tucker, L.R. (1964). The extension of factor analysis to three-dimensional matrices. In H. Gullikson & N.
Frederikson (Eds.), Contributions to mathematical psychology (pp. 110-127). New York: Holt, Rinehart, &
Winston.
Tucker, L. R. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31, 279-311.
Weiss, H. M., & Cropanzano, R. (1996). Affective events theory: A theoretical discussion of the structure, causes
and consequences of affective experiences at work. In B. M. Staw and L. L. Cummings (Eds.), Research in
Organizational Behavior (Vol. 18, pp. 1 – 74). Greenwich, CT: JAI Press.
Multivariate Dynamic Criteria 25
Weiss, H. M., Nicholas, J. P., & Daus, C. S. (1999). An examination of the joint effects of affective experiences and
job satisfaction and variations in affective experiences over time. Organizational Behavior and Human
Van Mechelen, I., & Kiers, H.A.L. (1999). Individual differences in anxiety response to stressful situations: A three-
Zyphur, M.J., Bradley, J.C., Landis, R.S., & Thoresen, C.J. (2008). The effects of cognitive ability and
conscientiousness on performance over time: A censored latent growth model. Human Performance, 21, 1-
27.
Multivariate Dynamic Criteria 26
Component
2 1
Rated Overall Service 0.865
Rated Average Quality 0.846
Rated Avg Handle Time 0.638
Table 7. Tucker core and variance explained for each 3-way combination of components.
Time Component 1
Variance Explained
Objective Behavioral
Subj 1 1.437 0.216 0.188 0.004
Multivariate Dynamic Criteria 29
Time Component 2
Objective Behavioral
Subj 1 0.137 -0.173 0.002 0.003
Subj 2 -0.347 0.442 0.011 0.018
Subj 3 0.286 -0.297 0.007 0.008
Figure 2. Centered average time series for self-rated customer service variables..
Multivariate Dynamic Criteria 31
Figure 4. Uncentered average time series for logged call time variables (objective performance).
Multivariate Dynamic Criteria 33
Note. The trends for these variables seem dissimilar. They are therefore displayed uncentered so as to show
their unique trends more clearly.
Figure 5. Time components of the 3×2×2 Tucker3-solution after rotation of the first component to optimal
constantness. The second component is still orthogonal.
Multivariate Dynamic Criteria 34
Figure 6. Joint biplot for participants and variables for the first time component.
Multivariate Dynamic Criteria 35
0.4 23
Citizn
27
4 15 20
47
0.2 CHandl 3634FocAllBehNeg
Focloc
25 33
32 42 53 WaitTm
2224Qserve
39
54 30
AHandl 38 831
Servic
48 40 12
29
7
28
0 26
18
2111
14 6
Durati 4944 17 BehNeu
5 131 10 1950 46
45 252
-0.2 37 43 35
41 51
16
3
-0.4
9
Figure 7. Joint biplot for participants and variables for the second time component.
Multivariate Dynamic Criteria 36
20 WaitTm2723
3454 4 1547 Citizn
28 22
38 32
4853 36
3925
42
24
40 33Focloc
BehNeg 30
0 19 12
2 13
11
7121
26
BehNeu
1729
49
644
1037 18FocAll
8AHandl
5Qserve
14
Servic 31
9 3 41
16 3546 52
50 45 43 CHandl
Durati
51
-0.2
-0.4
-0.6
-0.6 -0.4 -0.2 0 0.2
First Component
02/11/09 11:48:42