You are on page 1of 9

 B126 JC KB (kb)

Behavior Research Methods


2006, 38 (1), 32-50

Proximity coefficients as a measure of


interrelationships in sequences of behavior
PAUL J. TAYLOR
University of Liverpool, Liverpool, England

Although a range of methods allow investigators to measure the local dependencies among behav-
iors in a sequence, only indirect methods are available for measuring the interrelationships among
behaviors across an entire sequence. This article introduces a new “proximity” coefficient that mea-
sures interrelationships among behaviors as a direct function of their intrinsic organization within a
sequence. The coefficient does not depend on a user-defined “window” of analysis and provides an
efficient use of data that facilitates comparisons across actors, over time periods, and between single
cases. An analysis of artificial data shows further properties of the coefficient, including a diagonal
value that reflects the degree to which a behavior is reciprocated, and an asymmetry in values that de-
picts the relative precedence among behaviors. Extensions of the coefficient to the multivariate case,
and its relation to existing methods of analyzing sequences, are discussed.

Several recent articles in this journal have presented of a hostage negotiation (Taylor, 2002a) or the events that
new ways of extracting patterns of relations from se- lead up to a driving accident (Clarke, Forsyth, & Wright,
quences of behavioral events. Each applauds the insights 1998). This sequence is parsed into a series of behavioral
that have come from using existing sequence methods, units (e.g., utterances or thought units) and these units are
but also makes a case for why these methods do not fully coded as one of a number of categories, which are usually
capture the richness of the patterns in behavior sequences. predefined to capture conceptually meaningful variations
Griffin (2000), for example, shows how exploring multi- in behavior over the sequence. The resulting sequence of
subject interaction as transitions over a single stream of behaviors may be examined for a variety of issues relating
behaviors does not capture the way that individuals jointly to the occurrence, co-occurrence, and ordering of behav-
contribute to or coordinate the interaction process. Simi- iors over time. Investigators have traditionally taken one
larly, Magnusson (2000) provides evidence of patterns that of two main approaches to measuring such relationships
often remain hidden because they are either distorted by among the behaviors (Abbott, 1995).
intermittent events or located farther apart in the sequence Stochastic methods. One approach focuses on pat-
than current methods of analysis consider. This last no- terns and regularities in the step-by-step transitions among
tion, that important associations lie beyond the stochastic behaviors. The approach involves counting the number of
“window” in which existing techniques measure relations, times one behavior does and does not precede a second
is the focus of this article. The article will review some behavior, and from these counts computing a measure
limitations of existing approaches to analyzing sequences, of the association between the two behaviors (Bakeman,
and will argue that a notion of proximity captures the ex- McArthur, & Quera, 1996; Fagen & Mankovich, 1980;
isting methodological viewpoints in a more flexible way. Wampold, 1992). In more sophisticated analyses, the
The proximity between behaviors in a sequence will be frequencies may be combined with other conditional fre-
shown to be measurable by a new statistical coefficient, quencies and submitted to a lag or log analysis as a way of
which does not depend on a user-defined “window” of determining the relative associations among combinations
analysis and provides an efficient use of data that facili- of behaviors (Chatfield & Lemon, 1970; Iacobucci & Was-
tates comparisons across actors, over time periods, and serman, 1988). The resulting associations (or “chains” of
between single cases. these associations) may be interpreted directly to provide
insights into the character and underlying units of differ-
The Analysis of Behavioral Sequences ent types of sequences (Gottman, Markman, & Notarius,
The analysis considered in this article begins with an ob- 1977; Taylor & Donald, 2003). Alternatively, they may be
served sequence of behavioral events, such as the dialogue compared across groups to determine whether or not the
two groups differ with respect to the transitions of interest
(Bakeman et al., 1996; Yoder, Bruce, & Tapp, 2001).
By measuring the connections among a small number
The author thanks Mehdi Mobli for his help in deriving the statistical of adjacent behaviors, these stochastic methods excel
equations for the P coefficient. Correspondence should be addressed
to P. J. Taylor, School of Psychology, Eleanor Rathbone Building, Uni-
in providing a detailed picture of the contingencies that
versity of Liverpool, Liverpool L69 8PZ, England (e-mail: pjtaylor@ shape a sequence. However, their restricted focus also
liverpool.ac.uk). makes it difficult to address more global questions about

 Copyright 2006 Psychonomic Society, Inc.


     TAYLOR

the organization of behavior, particularly in terms of un- sections that are imposed according to a set of external
derstanding how the transitions come together to create and somewhat arbitrary rules. As these examples suggest,
the variety of frames (Drake & Donohue, 1996), empha- whole-sequence methods achieve their representation of
ses (Taylor, 2002a), or paths (Holmes & Sykes, 1993) that a sequence’s structure indirectly, and not from the actual
sequences take as they unfold over time. For example, contingencies among behaviors. Although these methods
an analysis of the major frames that individuals adopt in can yield useful results, the relative value of their output
negotiation requires a method that captures larger scale will always depend on the tolerance of the data to the
groupings in the interrelationships among behaviors (Tay- method’s various assumptions. This suggests that inves-
lor, 2002a). Similarly, investigating the process by which tigators need a method with results that are comparable
groups make decisions requires an analysis that identifies to those provided by existing whole-sequence techniques,
how larger, coherent periods of behavior are ordered over but with input consisting of the intrinsic relations among
time (Poole & Roth, 1989). The problem highlighted by cues and responses. Such a method, in line with the next
such examples is that important relationships lie beyond stage for stochastic methods, requires a way of measur-
the limited window or focus of sequential association ing interconnections among behaviors in a more general
measures. These relationships might speak to the broader way.
structure of the interaction process, as is the case when in-
vestigators identify phases or trends in behavior over time The Concept of Proximity
(Holmes & Sykes, 1993; Taylor, 2002b). Or they might I propose that the starting point for a general approach
result from the delayed effect of certain behaviors on the is the notion, inherent in current methods, of measuring
sequence, as is the case when sequences contain a number (in one form or another) the relative collocation among
of intermittent behaviors before a behavior reoccurs or is behaviors in a sequence. This notion may be seen as a
reciprocated (Magnusson, 2000; Putnam & Jones, 1982). general proximity principle: Behaviors close together in a
The importance of such relationships suggests that inves- sequence have more in common—in terms of the actor’s
tigators would do well to increase the window or “zone of motivating concerns, strategies, and cognitions—than
association” within which they measure the dependencies those far apart in the sequence. Proximity is basic to what
among behavioral events. It suggests that investigators allows our social interactions to make sense, with actions
should continue to focus on intrinsic associations, but that or utterances linking into other actions or utterances to
they should do so in a general way that encompasses the shape a fluid, unfolding process. In face-to-face negotia-
various degrees of association across a sequence. tion, for example, an individual is able to focus his or her
Whole-sequence methods. The second approach to dialogue on a chosen issue by using a particular group
sequence analysis forgoes the detail offered by step-by- of behaviors, such as the repeated use of criticisms and
step approaches in an effort to capture the broader orga- insults to attack the other party’s identity (Taylor, 2002a).
nization of behavior over the whole sequence (Abbott, These behaviors occur in close proximity and therefore
1995). One common method involves dividing a sequence have a quality in common that does not exist for behaviors
into a series of smaller sub-sequences, wherein the co- not in close proximity.
occurrence of two behaviors may be measured using an as- The proximity principle assumes that previous behav-
sociation coefficient (Donohue, 1991; Taylor, 2002a). The iors in a sequence have an association with the current
resulting association value indicates the extent to which behavior, and that the extent of this association decreases
the two behaviors typically co-occur in the sequence, but with increasing temporal distance from the current be-
it does so over a series of intervals large enough to cap- havior. This approach incorporates the stochastic nature
ture the relative interrelationships among many different of sequences by conceptualizing behavior as being more
behaviors. The changing emphasis of such intervals is the dependent on recently occurring behaviors than on be-
focus of a second technique known as phase analysis. This haviors that occurred farther in the past. It also retains the
analysis produces a “map” of the sequence that consists notion of behaviors holding relationships with all other
of a series of coherent phases, where each phase is an in- behaviors in a sequence, because proximity is conceptu-
terval in which behavior has a single predominant sub- alized as decreasing across the entire sequence. Thus, in
stantive function (Holmes & Sykes, 1993). As with other conceptualizing behavior relationships by their relative
whole-sequence methods, the result of phase analysis is a collocation, the proximity principle extends the “zone of
macrolevel picture of the overall structure of the behav- association” from a restrictive window of the sequence
ioral sequence. to the whole sequence. It provides a way of understand-
These methods are excellent at identifying the major ing the interrelationships among behaviors as per whole-
changes in behavior across a sequence. However, to de- sequence methods, but it does this in the language of local
rive this global picture, they typically measure the connec- cue–response relationships, as per stochastic methods.
tions among behaviors indirectly, by imposing extrinsic Proximity, therefore, provides a window into the world
criteria. For example, the analysis of behavioral relations of global dynamics by looking at the local cue–response
within artificially created subsections of a sequence gives sequences.
no consideration to the relationships among behaviors in Note that proximity is not a new concept to either theory
different sections. Phase analysis moves even farther away or method. In the social interaction literature, proximity
from the data by grouping behaviors into a series of sub- underlies the notion of limitation or channeling, whereby
Proximity COEFFICIENTS    

every exchange of messages is seen as narrowing down ple count of intervening behaviors suffices, in most cases.
the probability that other categories of talk will occur Regardless of the function used, the proximity coefficient
(Putnam, 1985; Watzlawick, Beavin, & Jackson, 1968). decreases monotonically as more behaviors are found, on
Those behaviors in immediate proximity may be shown average, to separate the two behaviors being examined.
to have the most influence on the direction of interaction
development, but those farther back in history also have Calculation of the Proximity Coefficient
their own unique association with current behavior (Tay- Consider the case of a single sequence of behaviors ob-
lor & Donald, 2003). The connection between proxim- tained from a coded social interaction. Let the behaviors
ity and interaction research also extends to theories that in the sequence be denoted by si (i 5 1, 2, . . . , n), where
incorporate dimensions (Collins, 1981; Donohue, 1998), i 5 1 for the first behavior in the sequence and i 5 n for
since these theories rest on the notion that behaviors close the final behavior in the sequence. Thus, the sequence
together in a sequence hold similar positions over the di- “EAB” would be indexed as E 5 s1, A 5 s2, B 5 s3, and
mensions. To not do so would mean that the dimensions n 5 3. The set of behaviors used to code the sequence
failed to capture behavioral dynamics in a coherent and will be denoted by an unordered set V 5 {v1, v2, . . . , vm},
meaningful way. where m varies, depending on the number of coding cat-
Proximity may also be seen in previous approaches to egories used to represent behavior in the sequence. If si is
analyzing sequences. For example, when investigators di- a specific occurrence of vp (where p can be 1, 2, . . . , m),
vide a sequence into a series of sub-sequences and count then we write si 5 vp. Finally, np denotes the number of
frequencies of behaviors, they are imposing an artificial times a particular behavior occurs in the sequence.
boundary whereby all those behaviors within the bound- The proximity coefficient between any two codes, vp
ary are treated as having proximity and all those outside and vq, may be regarded as asking, for each vp, to what
the boundary as not having proximity. Phase analysis extent one must move through the sequence to observe
examines proximity through a similar, stricter criterion vq. It is important to note that this question seeks to find
that considers only uninterrupted sequences of identical the distance between a given vp and the first following in-
behaviors to be related. Interestingly, the relationship be- stance of vq, rather than every following instance of vq. The
tween proximity and the degree of association between search for the minimum distance between codes is both
two behaviors is the focus of other analyses (e.g., Markov statistically and theoretically important. It is statistically
or log-linear analysis), which measure the extent to which important for ensuring that the coefficient’s value is not
one behavior can be used to predict future behavior. These dependent on the number of times a given code appears
methods consider directly the affect of proximal behaviors within the sequence. If the coefficient was derived from
on the development of a sequence. a behavior’s proximity to all following behaviors, there
would be a tendency for frequently occurring codes to
Coefficient of Proximity be associated with higher coefficients, since their greater
occurrence would lead them to be positioned closer to-
The evidence above suggests that the proximity con- gether in the sequence, on average, than were less frequent
cept may provide a powerful way of capturing the gen- codes, which is theoretically important for ensuring that
eral interrelationships among behaviors in a sequence. proximity is measured constantly for relations across a
To empirically measure proximity, an index is required sequence. For example, if there are only two occurrences
that meaningfully captures the relative proximity among of A in a sequence, and both are immediately followed by
pairs of behaviors in a sequence. This index is proposed B, then the coefficient for A to B should specify maximum
in the form of a proximity coefficient that expresses the proximity regardless of whether the occurrences of A are
interrelationships among types of behavior as a direct positioned adjacent to one another or at opposite ends of
function of their relative placements in a sequence. A se- the sequence. Of course, it remains possible that the sec-
quence is defined as a single stream of observations (e.g., ond instance of B depends on both occurrences of A, but
ABABABCCD) in which different behaviors (e.g., A, B, the coefficient views such subsequent relations as being
C, and D) may occur rarely or frequently, and in any order. adequately captured by the relative differences among the
As with many coefficients of association, the coefficient behaviors’ most immediate proximities. In cases in which
of proximity is constructed to vary between 0 and 1. A it is known that the organization of a sequence hinges on
zero coefficient for two behaviors, A and B, implies that successive dependencies, the coefficient might best be ap-
the distance between the occurrence of A and B is maxi- plied with more caution.
mum (i.e., the length of the sequence). A coefficient with a To find the first instance of vq from vp, the proximity co-
value of 1 implies that behavior A immediately precedes B efficient identifies the minimum difference in the indices
throughout the sequence, with no exception. Coefficients associated with vp and subsequent instances of vq through
with values between these two extremes reflect intermedi-
ate degrees of proximity between A and B. Specifically, ( )
d si = vp , vq = min[ j − i ] − 1;
values of the coefficient are calculated as a function of for all s j = vq , j > i, (1)
the number of acts separating relevant pairs of behaviors
across the sequence. The exact nature of the function may where d(si 5 vp, vq) is the distance between an occurrence
differ depending on theoretical considerations, but a sim- of code vp at position i and the first occurrence of code vq
     TAYLOR

occurring at a position j, greater than i. The subtraction of Donald, 2003) is accepted as the “window” within which a
1 is necessary to ensure that d(si 5 vp, vq) equals 0 when particular cue influences the progress of interaction, then
vq immediately follows vp, since this is conceptualized as adopting w 5 (n 2 2) / 4, where d(si 5 vp, vq) , (n 2 2)
a case of perfect proximity (i.e., there are no intermediate will limit estimation of proximity to only those behav-
behaviors). iors found within the triple-interact window. In this situa-
Since vp may occur many times within a sequence, it is tion, those behaviors outside of the four-behavior window
possible to obtain a better expression of d(si 5 vp, vq) by are considered to have no proximity to those within the
averaging across every occurrence of vp, window and so add nothing to the resulting coefficient
scores. Other choices do not necessarily involve arith-
1
np

si = vp
( )
d si = vp , vq . (2) metic changes to the way distances among behaviors are
measured, and may instead be conditional to return a par-
The proximity coefficient (P) simply restates Equation 2 ticular value based on some property (quality) of the dis-
in a standardized form, as a proportion of n: tance between si and sj or some other external criterion.
One particularly important use of weighting occurs

 s =v
(
 ∑ d si = vp , vq 

) when investigators want their coefficients to measure the
absolute distance between two codes. This will occur when
(
P vp , vq ) = 1−  i p
n ( n − 2 ) . (3) an investigator wants to test hypotheses about specific dif-
 p  ferences in the proximity of two behaviors across different
 
sequences. In the form presented above, P expresses the
The addition to the denominator is n 2 2 rather than n distance between two codes as a proportion of the number
to give a count of the number of possible distances be- of codes in the sequence. This makes the derived coeffi-
tween the two end codes rather than simply the number cient dependent on the length of the sequence examined.
of codes. For example, if behavior A occurs only once at the begin-
An inspection of the limits of d(si 5 vp, vq) confirms ning of a sequence, and behavior B occurs only once at the
that P is bound between 0 and 1. Take the case of mini- end of that sequence, P(A,B) 5 0.00 regardless of whether
mum proximity, in which vp occurs once at the beginning the sequence is 10 behaviors or 1,000 behaviors in length.
of the sequence (i.e., si 5 1) and vq occurs once at the end However, behaviors A and B are separated by far more
of a sequence (i.e., sj 5 n). Restating Equation 1 in sample intervening events in a 1,000-behavior sequence than in
terms gives a 10-behavior sequence, and investigators may want to
capture this kind of disparity when conducting their com-

(
d si = v p , vq = [ n − 1] − 1, )

(4) parisons. To capture the absolute distance between behav-
iors, it is necessary to weight d(si 5 vp, vq) by a ratio of the
which, used with Equation 3, gives lengths of the sequences being compared. This ratio may
 [ n − 1] − 1  be expressed in a general form as
(
P v p , vq = 1 −  ) .
 np ( n − 2 ) 
(5)
w=
nk - 2
, (7)

Since np 5 1, the members of the loss function in pa- (n −2
k max
)
rentheses equate to 1 and P(vp,vq) 5 0. The upper limit where nk is the number of codes that appear in the sequence
P(vp,vq) 5 1 occurs only when vq always immediately fol- for which the proximity coefficients are being calculated,
lows si, and is similarly established by replacing i with and nkmax is the number of codes that appear in the larg-
(i 1 1) in Equation 2. This results in d(si 5 vp, vq) 5 0, est sequence of the data set. The weighting in Equation 7
and, consequently, the loss function equating to 0 and simply rescores the distances observed between vp and vq
P(vp, vq) 5 1. in smaller sequences, as if they were distances observed in
Note that the coefficient described above is an exam- a sequence with length nkmax. The result is a loss function
ple of a general approach to measuring the interrelations that is equivalent over different sequence lengths, thereby
among behaviors of a sequence. One simple generalization producing a set of proximity coefficients that reflect the
of the equations given above is to introduce a weighting w absolute distance among codes; their values may be di-
to the value returned by the numerator in Equation 2, rectly compared.
1
np

si = np
(
w × d si = v p , vq . ) (6)
Illustrative Example

The choice of weighting implicit in Equations 2 and 3 To illustrate the proximity coefficient in sample data,
(i.e., w 5 1) measures proximity among behaviors inde- suppose we observe two interaction sequences. The left
pendent of further theoretical restrictions. However, it is panel of Table 1 illustrates these sequences using letters
easy to perceive a situation in which it is appropriate to to denote the occurrence of behaviors, with different let-
restrict the scope of proximity on the basis of theoretical ters indicating the occurrence of different types of behav-
or data considerations. For example, if the cue–response– iors. Sequence 1 involves five behaviors that occur in a
cue–response sequence (i.e., triple-interact; see Taylor & sequence of 10 units. Sequence 2 involves the same five
Proximity COEFFICIENTS    

Table 1
An Example of Two Behavior Sequences and
Their Resulting Proximity Coefficient Matrices
Resulting Proximity Coefficient Matrix
Observation
Event Sequence Type (va) Observation Type (vb)
A B C D E
Sequence 1: A 88 100 63 38 –
  EA BA BA B C C D B 100 88 75 50 –
C – – 100 94 –
D – – – – –
E 100 88 25 0 –
A B C D E
Sequence 2: A 90 100 83 72 67
  E A B A B A B C C D . . . B 96 90 89 78 72
  EA BA BA B C C D C 86 81 85 97 92
D 94 89 61 50 100
E 100 94 67 56 50
A B C D E
Sequence 3 A 94 100 83 72 29
with absolute proximities: B 100 94 89 78 29
  EA BA BA B C C D C 29 29 100 97 29
D 29 29 29 29 29
E 100 94 67 56 29
Note—Decimal points omitted.

behaviors but is 20 units in length. Sequence 3 is a reanal- a row of missing values indicates a behavior that occurs
ysis of Sequence 1 that uses a different weighting of the only at the last position in the sequence (e.g., behavior D
proximities among behaviors. The matrices on the right in Sequence 1).
panel of Table 1 give the proximity coefficients associated Note that the matrix for Sequence 1 also reports a co-
with each sequence. efficient for a behavior preceding itself over the course
An inspection of Sequence 1 shows that behavior E and of the sequence. Proximity coefficients on the diagonal
behavior D occur only once, and at opposing ends of the of a matrix are meaningful and provide a measure of the
sequence, so that their proximity is the minimum possible. amount of reciprocity (Putnam & Jones, 1982) associated
In contrast, behavior B always occurs directly after be- with the relevant behavior. The coefficients actually pro-
havior A, so that the proximity of these behaviors is the vide a graded measure of reciprocation, in the sense that
maximum possible. Consistent with these two limits, the they quantify the number of intervening codes that occur
coefficient matrix reports a perfect association between A on average before reciprocation, rather than simply the
and B (1) and a complete nonassociation between E and proportion of immediate reciprocation. Thus, coefficients
D (0). All of the other relationships among the codes have on the diagonal may be used to test hypotheses about the
intermediate values that are dependent on their distances nature and breadth of reciprocity, such as the possibility
apart in the sequence. For instance, there are descending that individuals delay their reciprocation to allow for in-
values of coefficients for behavior A in relation with B termediate behaviors that confirm the initial statement
(1), C (.63), and D (.38). Examining the sequence con- (Putnam & Jones, 1982). In general, predictions about
firms that behavior A is closest, on average, to behavior B; reciprocity or any other relationship may be defined in
is slightly less close to the two occurrences of behavior C; terms of the proximity between two behaviors, where de-
and is farthest away from the concluding behavior, D. viation from the predicted proximity may be measured
The undefined value of the coefficient measuring the for its statistical likelihood (Efron & Tibshirani, 1986).
relationship of A to E is appropriate for Sequence 1, be- If the deviation is small enough, then it might be possible
cause E never follows A. While missing coefficients are to conclude that the observed degree of reciprocity is un-
an inevitable consequence of short sequences, the inter- likely to have occurred by chance.
pretation of missing coefficients within longer sequences Sequence 2 contains the same behaviors as Sequence 1
can provide an indication of the relative distribution of but includes a repeat of the 10 behaviors in Sequence 1.
behavior within the sequence. A large number of missing The extra length of Sequence 2 provides sufficient obser-
values in a variable row (or column) indicates that most vations to allow the calculation of proximity coefficients
observations of the behavior occurred toward the end (or for all pairs of behavior (which is not necessarily true of
beginning) of the sequence, as is the case for behavior C existing methods), thereby leaving no empty cells in the
in Sequence 1. At the extreme, a column of missing val- matrix. As before, relative proximity in the sequence is
ues indicates a behavior that occurs only at the beginning reflected in the coefficients, with, for example, descend-
of a sequence (e.g., behavior E in Sequence 1), whereas ing values occurring for the relation of behavior A with
     TAYLOR

B (1), C (.83), D (.72), and E (.67). Because Sequence 2 Extensions and Relationships
only repeats the ordering of behaviors in Sequence 1, the
organization of behaviors remains constant across the two Extensions in Analytical Techniques
sequences and, consequently, so does the rank ordering Multivariate proximities. A second important prop-
of the coefficient values. However, the absolute values of erty of P is that it is analytical in d, facilitating multi-
the coefficients are higher for Sequence 2 when compared variate extensions. The treatment above has been for the
with Sequence 1. This results from the fact that proxim- case of assigning each unit one behavioral code, but it
ity is estimated in the context of the complete sequence. is possible to compute a single coefficient for cases in
Because it is not necessary to search through more than which units are assigned codes from more than one coding
50% of Sequence 2 to find an occurrence of A preceding scheme. One approach is to calculate conditional proximi-
B, C, D, or E, the coefficients for behavior A are all above ties of the form
.50. Indeed, in the current sequence, this happens to be the
case for each of the 5 behaviors. (
P v p , vq : v p , vq :. . .: v p , vq
1 1 2 2 k k
)
A second difference between the matrix produced from
Sequence 1 and the matrix produced from Sequence 2
=1−
(
w1 × d si = v p , vq
1 1
)
is asymmetry in the coefficient values. For example, the np ( N − 2)
coefficient for behavior A preceding B (i.e., 1.00) is no 1

longer identical to the coefficient for behavior B preced-


ing A (i.e., .96). In general, matrices of proximity coeffi- ×
(
w2 × d si = v p , vq
2 2
)
cients will be asymmetrical, reflecting the possibility that np ( N − 2)
2
one behavior precedes the second behavior on the major-
ity of occasions. For example, in Sequence 2, behavior
×... ×
(
wk × d si = v p , vq
k
(8) k
),
A occurs before B on six occasions, whereas behavior B np ( N − 2)
occurs before A on five occasions. This leads to a small k

difference of .04 (i.e., 1.00 2 .96 5 .04) between the coef- where each d(si 5 vpk, vqk ) is a different coding applied
ficients, which reflects the slight bias toward behavior A to the interaction sequence, wk is the weighting for that
occurring before behavior B. Larger differences between coding, and npk is the number of times the behavior from
coefficients suggest a greater asymmetry in the ordering that coding occurs in the sequence. This approach pro-
of two behaviors, with a missing coefficient indicating duces a coefficient that denotes the proximity in which a
that all instances of a behavior precede all instances of the particular set of behaviors (i.e., each vq) follow the codes
second behavior. assigned to the current behavior (i.e., each vp). The pos-
Finally, Sequence 3 is a reanalysis of Sequence 1, in sibility of measuring the interrelationships among sev-
which the coefficients were weighted to make them di- eral behavioral streams should be useful, for example,
rectly comparable to those derived from Sequence 2. to investigators who want to map the connections be-
Specifically, the coefficients for Sequence 3 were calcu- tween individuals’ verbal behavior and their self-reported
lated by applying the weighting given in Equation 7 with “stream of thought” (Sillars, Roberts, Leonard, & Dun,
nkmax 5 20, which is the number of behaviors in the larger 2000). A set of associations might be predicted between
comparison sequence, Sequence 2. One consequence of the structure of cognitive units and groups of behaviors,
this weighting is that the absolute proximities measured with future analyses examining how changes in context
for Sequence 3 are markedly higher than the coefficients or person variables mediate these connections. Other ex-
for Sequence 1 (see Table 1). This occurs because P ex- tensions might combine an analysis of dialogue with an
presses the distances among behaviors as a proportion of analysis of nonverbal cues (Beattie & Shovelton, 2002), or
overall sequence length, so that they become relatively simultaneously examine substantive and relational aspects
smaller (and consequently, P larger) in the context of the of dialogue by employing the relevant coding schemes
longer 20-behavior sequence (i.e., Sequence 2) than in the (Donohue, 1998).
original 10-behavior sequence. A second consequence of Testing the null hypothesis. In some circumstances,
the weighting is that distances among behaviors are mea- investigators may want to evaluate whether or not the prox-
sured identically for both sequences. In this example, imity observed between two behaviors is likely to have oc-
because Sequence 2 simply repeats the 10 behaviors of curred by chance. Specifically, they may want to evaluate
Sequence 3, the relative distances among the behaviors is whether or not an observed coefficient is important (e.g.,
equivalent in both sequences. This equivalence is reflected high in value) by comparing it to the value expected for
in Table 1 by the identical values of the coefficients. For that coefficient if behaviors were distributed randomly in
example, there are descending coefficients of identical the sequence. For example, an investigator of parent–child
value for the relation of behavior E with A (1), B (.94), C interaction may want to evaluate whether the reciprocation
(.67), and D (.56). As before, the shortness of Sequence 3 of a parent’s smile, as measured by P(Smile, Smile), is signifi-
leads to the missing value associated with the relation E to cantly higher than the P(Smile, Smile) expected under the null
itself, which is computable for Sequence 2. The shortness hypothesis of chance reciprocation. Since the observed P
of Sequence 3 is also responsible for discrepancies in the coefficient will be interpreted differently depending on
values of the coefficients on the diagonal of the matrix. how much it deviates from the expected value, it is im-
Proximity COEFFICIENTS    

portant to have a method of estimating the distribution of of a matrix may be interpreted in a similar way to that
P under the null hypothesis. One solution to this problem described above, with lower values indicating lower prox-
is to permute the observed sequence many times (e.g., imity within the sequence. In contrast, coefficients on the
10,000 times) while calculating P coefficients for each diagonal of the matrix will have a different interpretation,
permutation (Efron & Tibshirani, 1986). This procedure which relates to the average duration of occurrence of the
would provide the empirical distribution of each P(vp, vq) relevant behavior. The attractive feature of all these coef-
under the hypothesis of randomness, from which a p value ficients, however, is that they represent a general measure
for the observed P(vp, vq) may be estimated by locating that does not depend on a specified time window or tar-
its value in the empirical distribution. The nearer the ob- get event to measure the relationships among behaviors
served P(vp, vq) to the tails of the empirical distribution, (Bakeman & Quera, 1995).
the more confident an investigator can be that the proxim-
ity between vp and vq did not occur by chance. Relationships With Other Techniques
Comparison of two coefficients. Some investigators To help round out the picture of where proximity coef-
may be keen to test the difference between two proxim- ficients fit as a technique for analyzing sequences, it may
ity coefficients or mean coefficient values. Comparisons be useful to show that P has precise relationships with
of coefficient values may be used to test the relationship other sequence methods.
between a range of independent variables and the orga- Sequential association measures. As outlined by
nization of speakers’ cues and responses. For example, a Bakeman et al. (1996), existing indices such as phi or
comparison of coefficients derived from actual negotia- kappa are successfully used to measure the strength of
tions and training simulations provides a detailed account association or transitional likelihood among adjacent be-
of the various differences in interaction dynamics between haviors. These indices measure the specific case of imme-
these two contexts. The typical behavioral chains, the im- diate proximity, in which only neighboring behaviors are
pact of different strategies on the other party’s response, conceptualized as having joint influence on interaction.
and the long-term ordering of behavior are just three as- The indices are therefore equivalent to proximity coef-
pects of behavior that can be examined by comparing such ficients that are weighted to disregard behaviors that do
matrices. For most purposes, such comparisons may be not immediately follow vp. This equivalence is not perfect,
achieved by computing a permutation test on the differ- however, since the weighted P would be calculated from
ences among coefficient values, as outlined by Efron and measures of how often a target behavior does and does not
Tibshirani (1986), Bakeman et al. (1996), and others. The follow vp, whereas most association indices also consider
analysis may assume independence of sampling when cal- how often the absence of vp is and is not followed by the
culations are made across different sequences but would target behavior. The difference here is that most associa-
require adjustment to account for dependence among ob- tion indices equate joint absence of two behaviors as in-
servations when comparing within a single sequence. In- dicating a stronger association, whereas P does not make
terestingly, two different possibilities are available for es- this assumption and relies on a conditional likelihood as
timating the sampling distribution in this scenario. When an estimate of the relationship between the behaviors. This
comparing groups of sequences, the distribution may be offers an important theoretical alternative, particularly
derived through random sampling of the same coefficient in cases in which the investigator is unsure whether the
in the different sequences. However, for small samples, it behavior did not occur or was not observed as occurring
is also possible to estimate the extremity of a particular (Dice, 1945; Taylor, Bennell, & Snook, 2002).
difference by deriving a distribution of differences from Lag sequence analyses. Lag analysis exemplifies a
all possible pairings of coefficients in the proximity ma- number of methods that examine the relative frequencies
trix. The permutation approach is therefore likely to pro- of immediate and longer term transitions among behav-
vide the most appropriate test of differences in small and iors (Bakeman & Quera, 1995). Conceptually, lag analysis
large data sets. simply looks at specific instances of proximity, with lag 1
Extensions to time sequence data. The development relating to what appears in immediate proximity, lag 2 re-
of recording technologies has increasingly led investiga- lating to what appears in less proximity, and so on. The
tors to record and analyze data in which the relative timing analysis may be replicated by using a series of weight-
and duration of events are represented. This form of data, ings, with the relevant proximity values for each weight-
known as timed event sequence data (Bakeman & Quera, ing plotted in the same way investigators plot behavioral
1995), may in principle be analyzed by proximity coef- occurrence across different lags (Putnam & Jones, 1982).
ficients that use the difference between offset and onset However, this approach defeats the purpose of the proxim-
times as the basis for measuring d(si 5 vp, vq). Specifi- ity coefficient, which is to combine all of the lags into one
cally, the modified coefficient would replace the index i coefficient that denotes the average occurrence of higher
with a count of seconds elapsed from the beginning of the lags (i.e., further distance between behaviors) with a lower
sequence, so that d(si 5 vp, vq) reflects the gap in seconds coefficient value.
between the offset of vp and the onset of vq. P would then Gamma analysis. Gamma analysis is a set of nonpara-
reflect the gap between the offset and onset times as a metric statistics that provide a measure of the general order
proportion of the sequence’s total length in seconds. When of behaviors in a sequence and a measure of the distinc-
calculated in this way, P coefficients on the off-diagonal tiveness or overlap of behavior types (Pelz, 1985). Psycho-
     TAYLOR

logical research has typically used three measures: Pelz’s creasing in a manner that represented the order of phases.
gamma, which measures the proportion of A behaviors Coefficient values of less than 1 on the diagonal would
that precede or follow B behaviors in a sequence; prece- indicate a separation or recycling of the phases. A detailed
dence scores, which indicate the location of the behaviors analysis of the lower matrix coefficients would give some
in the overall ordering of element types; and separation indication as to whether reoccurring phases had a com-
scores, which give an indication of the relative distinctive- mon predecessor.
ness of behavior types. There is no direct relation between Optimal matching. An extension of phase analysis,
the proximity coefficient and Pelz’s gamma or precedence known as optimal matching analysis, consists of tech-
scores, because P is based on distances between behaviors niques that compute either the overall similarity of two
and not on the relative ordering of the elements. However, or more sequences or the similarity of these sequences to
a parallel measure is given by the disparity in P for any a prototypical sequence (Holmes, 1997; Sankoff & Krus-
two behaviors (i.e., P(A,B) 2 P(B,A)), where the resulting kal, 1983). The result of optimal matching is a dissimilar-
value provides an indication of the difference in likelihood ity score that can be compared with other scores to give
of A preceding or succeeding B. A larger disparity indi- an indication of the differences among sequences (e.g.,
cates a greater asymmetry in the ordering of occurrences through multidimensional scaling). Although proximity
among the two behavior types. offers no direct comparison to this approach, it is pos-
In contrast to the above indirect relationship, a direct sible to generate a similar analysis by comparing matrices
relation exists between separation scores and the diago- of proximity coefficients computed from different se-
nal of a proximity coefficient matrix. Coefficients on quences. Specifically, the disparity between two matrices
the diagonal denote the extent to which a single behavior may be used as a measure of the dissimilarity between the
reciprocates without intervening or “separating” behav- associated sequences, with greater disparity indicating a
iors. A high coefficient on the diagonal will indicate a larger difference in the behavioral organization of each se-
relatively coherent, separate period of occurrence for the quence. By computing a measure of dissimilarity for two
relevant behavior. In contrast, a low coefficient suggests matrices, and repeating this calculation for every pair of
little separation of the behavior from other acts in the se- matrices derived from a data set, it is possible to generate a
quence. However, whereas a separation score measures set of coefficients that measure sequence dissimilarities in
the extent to which a behavior forms a single coherent a way comparable to optimal matching. A further, detailed
sub-sequence, the value of the proximity coefficient is comparison of the matrices also makes it possible to un-
less stringent and allows for the possibility of two or more cover the major variations in cue–response contingencies
coherent sub-sequences occurring across an interaction. that underlie the similarities among sequences as a whole.
This should be particularly useful for investigators analyz- Such an approach to sequence comparison has the advan-
ing the structure of interactions that are likely to repeat the tage of not requiring the stipulation of substitution costs,
same phases of action (e.g., decision-making meetings, which are external to the data and can dramatically affect
Poole & Roth, 1989). the results of optimal matching analysis.
Phase analysis. Phase analysis represents the pattern
of behaviors in a sequence by providing a serial map of co- Conclusions
herent periods or phases of interaction (Holmes & Sykes, This article sought to define an empirical method of
1993). The approach enacts a strict case of proximity, in quantifying the interrelationships among sequences of
which behaviors are considered related only when they behaviors. Most existing research has measured these re-
are part of a sequence of identical behaviors or a sequence lationships indirectly, either by imposing extrinsic divi-
of identical behaviors with a predefined number of ex- sions on the data (see, e.g., Taylor, 2002a) or by focusing
ceptions. Because the proximity coefficient is more flex- on consistencies in immediate cue–response transitions
ible than this all-or-none criterion, it is able to provide (see, e.g., Bakeman & Quera, 1995). However, to make
a refined picture of interaction phases. Specifically, the fully empirical our conceptual theories about behavioral
reciprocation values of the proximity coefficient (e.g., processes, it is beneficial to introduce a precise way of
P(A,A)) indicate the extent to which a particular type of measuring the overall structure of localized connections
phase occurs, with higher values reflecting greater occur- among behaviors (Collins, 1981). The concept proposed
rence of phases. The off-diagonals of a proximity coeffi- in this article was proximity: The closer two behaviors
cient matrix indicate the ordering of phases. For example, occur in a sequence, the more they have in common con-
phase analysis is often used to test the prediction that ceptually. To measure proximity, a coefficient was intro-
interactions move through a number of coherent phases duced that expresses the interrelationships among behav-
(Holmes & Sykes, 1993; Poole & Roth, 1989). Within iors as a direct function of their relative placements in a
the framework of proximity, a perfect sequence of phases sequence. The coefficient is a general, computationally
would be represented by a (rearranged) matrix in which simple measure that avoids the arithmetic manipulations
the upper off-diagonal coefficients were missing (since and extrinsic assumptions about the breadth of relations
previous phases should not occur again), the coefficients made by many existing techniques. It also boasts other
on the diagonal equaled 1 (since phases are defined as properties, including a method of analyzing reciproca-
uninterrupted occurrences of a particular code), and the tion and asymmetry among occurrences of behavior, and
lower off-diagonal coefficients were monotonically de- an efficient use of data that enables comparisons across
Proximity COEFFICIENTS    

speakers, among transcripts, and across different sections Holmes, M. E. (1997). Optimal matching analysis of negotiation phase
of the same sequence. Future work may therefore use the sequences in simulated and authentic hostage negotiations. Commu-
nication Reports, 10, 1-8.
coefficient to investigate how complex cue–response se- Holmes, M. E., & Sykes, R. E. (1993). A test of the fit of Gulliver’s
quences underlie the global patterns of movement that are phase model to hostage negotiations. Communication Studies, 44,
observed over dimensions and constructs of social inter- 38-55.
action. To aid investigators in this work, a Perl executable Iacobucci, D., & Wasserman, S. (1988). A general framework for the
file that calculates the coefficient matrices for behavioral statistical analysis of sequential dyadic interaction data. Psychological
Bulletin, 103, 379-390.
sequences has been developed. It is available from the au- Magnusson, M. S. (2000). Discovering hidden time patterns in behav-
thor by request. ior: T-patterns and their detection. Behavior Research Methods, In-
struments, & Computers, 32, 93-110.
REFERENCES Pelz, D. C. (1985). Innovation complexity and the sequence of innovat-
ing stages. Knowledge: Creation, Diffusion, Utilization, 6, 261-291.
Abbott, A. (1995). Sequence analysis: New methods for old ideas. An- Poole, M. S., & Roth, J. (1989). Decision development in small groups:
nual Review of Sociology, 21, 93-113. IV. A typology of group decision paths. Human Communication Re-
Bakeman, R., McArthur, D., & Quera, V. (1996). Detecting group search, 15, 323-356.
differences in sequential association using sampled permutations: Log Putnam, L. L. (1985). Bargaining as task and process: Multiple func-
odds, kappa, and phi compared. Behavior Research Methods, Instru- tions of interaction sequences. In R. L. Street, Jr. & J. N. Cappella
ments, & Computers, 28, 446-457. (Eds.), Sequence and pattern in communicative behaviour (pp. 225-
Bakeman, R., & Quera, V. (1995). Analyzing interaction: Sequential 242). London: Edward Arnold.
analysis with SDIS and GSEQ. New York: Cambridge University Press. Putnam, L. L., & Jones, T. S. (1982). Reciprocity in negotiations: An
Beattie, G., & Shovelton, H. (2002). What properties of talk are as- analysis of bargaining interaction. Communication Monographs, 49,
sociated with the generation of spontaneous iconic hand gestures? 171-191.
British Journal of Social Psychology, 41, 403-417. Sankoff, D., & Kruskal, J. B. (1983). Time warps, string edits, and
Chatfield, C., & Lemon, R. E. (1970). Analysing sequences of behav- macromolecules: The theory and practice of sequence comparison.
ioural events. Journal of Theoretical Biology, 29, 427-445. Reading, MA: Addison-Wesley.
Clarke, D. D., Forsyth, R. S., & Wright, R. L. (1998). Junction road Sillars, A., Roberts, L. J., Leonard, K. E., & Dun, T. (2000). Cog-
accidents during cross-flow turns: A sequence analysis of police case nition during marital conflict: The relationship of thought and talk.
files. Accident Analysis & Prevention, 31, 31-43. Journal of Social & Personal Relationships, 17, 479-502.
Collins, R. (1981). On the microfoundations of macrosociology. Amer- Taylor, P. J. (2002a). A cylindrical model of communication behavior
ican Journal of Sociology, 86, 984-1014. in crisis negotiations. Human Communication Research, 28, 7-48.
Dice, L. R. (1945). Measures of the amount of ecologic association be- Taylor, P. J. (2002b). A partial order scalogram analysis of communi-
tween species. Ecology, 26, 297-302. cation behavior in crisis negotiation with the prediction of outcome.
Donohue, W. A. (1991). Communication, marital dispute, and divorce International Journal of Conflict Management, 13, 4-37.
mediation. Hillsdale, NJ: Erlbaum. Taylor, P. J., Bennell, C., & Snook, B. (2002). Problems of clas-
Donohue, W. A. (1998). Managing equivocality and relational paradox sification in investigative psychology. In K. Jajuga, A. Sokolowski, &
in the Oslo peace negotiations. Journal of Language & Social Psy- H.-H. Bock (Eds.), Classification, clustering, and data analysis: Re-
chology, 17, 72-96. cent advances and applications (pp. 479-487). Heidelberg: Springer.
Drake, L. E., & Donohue, W. A. (1996). Communicative framing the- Taylor, P. J., & Donald, I. (2003). Foundations and evidence for an
ory in conflict resolution. Communication Research, 23, 297-322. interaction-based approach to conflict negotiation. International Jour-
Efron, B., & Tibshirani, R. (1986). Bootstrap methods for standard er- nal of Conflict Management, 14, 213-232.
rors, confidence intervals, and other measures of statistical accuracy. Wampold, B. E. (1989). Kappa as a measure of pattern in sequential
Statistical Science, 1, 54-77. data. Quality & Quantity, 23, 171-187.
Fagen, R. M., & Mankovich, N. J. (1980). Two-act transitions, parti- Watzlawick, P., Beavin, J. H., & Jackson, D. D. (1968). Pragmatics
tioned contingency tables, and the “significant cells” problem. Animal of human communication: A study of interactional patterns, patholo-
Behaviour, 28, 1017-1023. gies, and paradoxes. London: Faber.
Gottman, J., Markman, H., & Notarius, C. (1977). The topography Yoder, P. J., Bruce, P., & Tapp, J. (2001). Comparing sequential as-
of marital conflict: A sequential analysis of verbal and nonverbal be- sociations within a single dyad. Behavior Research Methods, Instru-
havior. Journal of Marriage & the Family, 39, 461-477. ments, & Computers, 33, 331-338.
Griffin, W. A. (2000). A conceptual and graphical method for converg-
ing multisubject behavioral observational data into a single process
indicator. Behavior Research Methods, Instruments, & Computers, (Manuscript received August 6, 2004;
32, 120-133. revision accepted for publication January 21, 2005.)

You might also like