You are on page 1of 11

Syst. Biol.

50(5):689699, 2001

Character Analysis in Morphological Phylogenetics:


Problems and Solutions

J OHN J. WIENS
Section of Amphibians and Reptiles, Carnegie Museum of Natural History, Pittsburgh,
Pennsylvania 15213-4080 , USA; E-mail: wiensj@carnegiemuseums.org

Abstract.Many aspects of morphological phylogenetics are controversial in the theoretical system-


atics literature and yet are often poorly explained and justied in empirical studies. In this paper, I
argue that most morphological characters describe variation that is fundamentally quantitative, re-
gardless of whether they are coded qualitativel y or quantitatively by systematists. Given this view,
three fundamental problems in morphological character analysis (denition, delimitation, and order-
ing of character states) may have a common solution: coding morphological characters as continuous

Downloaded from http://sysbio.oxfordjournals.org/ at Indian Institute of Science on February 29, 2016


quantitative traits. A new parsimony method (step-matrix gap-weighting, a modication of Thieles
approach) is proposed that allows quantitative traits to be analyzed as continuous variables. The
problem of scaling or weighting quantitative characters relative to qualitative characters (and to each
other) is reviewed, and three possible solutions are described. The new coding method is applied to
data from hoplocercid lizards, and the results show the sensitivity of phylogenetic conclusions to dif-
ferent scaling methods. Although some authors reject the use of continuous, overlapping, quantitative
characters in phylogenetic analysis, quantitative data from hoplocercid lizards that are coded using
the new approach contain signicant phylogenetic structure and exhibit levels of homoplasy simi-
lar to those seen in data that are coded qualitativel y. [Character coding; morphology; phylogenetics;
quantitative characters; weighting.]

Good science requires clearly explained, included (Pimentel and Riggins, 1987; Nixon
repeatable methods. Yet, practitioners of and Wheeler, 1990; Stevens, 1991; Campbell
morphological phylogenetics tend not and Frost, 1993; Thiele, 1993; Wiens, 1995,
to be explicit about their methodology, 1998; Rae, 1998), how within-species vari-
specically, how morphological characters ation is coded (Archie, 1985; Campbell and
are selected, and how states are dened, Frost, 1993; Thiele, 1993; Wiens, 1995, 1999;
delimited, coded, and ordered (a process I Swiderski et al., 1998 Smith and Gutberlet,
refer to as character analysis). The lack 2001), how character states are ordered
of methodological explanation in published (Hauser and Presch, 1991; Lipscomb, 1992;
morphological studies has been discussed by Wilkinson, 1992; Slowinski, 1993), and how
several authors (e.g., Pimentel and Riggins, different types of morphological characters
1987; Pogue and Mickevich, 1990; Stevens, are weighted relative to each other (e.g., Far-
1991; Thiele, 1993; Wiens, 1995) and has been ris, 1990; Campbell and Frost, 1993; Wiens,
documented for character selection (Poe and 1995, 1998). Different choices and assump-
Wiens, 2000). This is a particularly serious tions are important because they can lead to
problem, because in contrast to analysis of radically different trees (e.g., Wiens, 1995).
DNA sequence data, in which character def- In this paper, I suggest that for many
inition and character state delimitation are morphological characters, these problems
virtually automatic (the nontrivial problem and controversies in the selection, deni-
of alignment notwithstanding), morpholog- tion, delimitation, and ordering of charac-
ical character analysis requires considerable ters may have a common solution. Many,
effort, involving many methodological if not most, morphological characters de-
decisions and implicit assumptions at every scribe variation in quantitative traits (e.g.,
step in the process. differences in size, shape, or counts of se-
Many aspects of morphological character rially homologous structures), regardless of
analysis are controversial, including the way whether systematists choose to code them
in which characters are constructed (e.g., quantitatively or qualitatively (Stevens, 1991;
Maddison, 1993; Pleijel, 1995; Wilkinson, Thiele, 1993). Given this, three fundamen-
1995; Hawkins et al., 1997; Lee and Bryant, tal problems of character analysis (charac-
1999; Strong and Lipscomb, 1999), whether ter state denition, delimitation, and order-
intraspecically variable characters can be ing) potentially can be solved by simply

689
690 S YSTEMATIC BIOLOGY VOL. 50

coding these quantitative traits as contin- ADVANTAGES OF T REATING


uous, quantitative variables. I propose a M ORPHOLOGICAL CHARACTERS AS
parsimony method that allows quantitative CONTINUOUS Q UANTITATIVE V ARIABLES
traits to be analyzed directly as continu- Morphological characters reported in the
ous variables (a modication of the gap- phylogenetics literature typically describe
weighting method of Thiele [1993]). I then variation that is fundamentally quantitative,
discuss the problem of scaling (or weight- whether it is variation in relative size or
ing) quantitative characters relative to qual- shape of structures or in counts of meris-
itative characters and to each other and tic characters. However, as noted by Stevens
describe three possible solutions to this prob- (1991) and Thiele (1993), quantitative vari-
lem. I demonstrate the new approach to cod- ation is often coded as discrete through
ing, using an empirical data set for hoplo- qualitative description or use of a quanti-
cercid lizards and show that phylogenetic tative cutoff (e.g., state 0 D 24 scales; state

Downloaded from http://sysbio.oxfordjournals.org/ at Indian Institute of Science on February 29, 2016


results can be highly sensitive to different 1 D 57 scales). Explicit quantitative coding
scaling methods. Finally, although many au- methods, such as M-coding (Goldman, 1988)
thors have advocated excluding continuous and gap weighting (Thiele, 1993), have been
quantitative characters from phylogenetic used by some systematists (e.g., Boughton
analyses, I show that quantitative data from et al., 1991; Chu, 1998; Gutberlet, 1998; Poe,
hoplocercid lizards coded by this new ap- 1998). Other systematists exclude quantita-
proach do contain signicant phylogenetic tive characters because of so-called contin-
signal and exhibit levels of homoplasy uous variation (meaning extensive overlap
similar to those for data that are coded in ranges of trait values between species),
qualitatively. given the idea that such data are unsuit-
able for phylogeny reconstruction (e.g., Pi-
mentel and Riggins, 1987). However, most
T ERMINOLOGY published studies discuss neither how vari-
There is often confusion surrounding the ation was coded nor what criteria were used
terminology of different types of morpho- for character selection (see review by Poe and
logical characters. As noted in Thieles (1993) Wiens, 2000).
review, quantitative characters are described I make three proposals. First, morpholog-
using numbers, whether those numbers de- ical systematists should explain clearly, and
scribe the relative size or shape of a struc- justify, their criteria for selection of charac-
ture (morphometric characters) or a count ters and their methods of character anal-
of serially homologous traits (meristic char- ysis (i.e., dening, delimiting, coding, and
acters, such as the number of teeth, limbs, ordering character states). Second, exclud-
or vertebrae). Qualitative characters are de- ing characters because of overlapping ranges
scribed with words (e.g., short, long, present, of intraspecic variation (continuous) is
absent). Continuous characters are charac- unjustied. Intraspecically variable, over-
ters that can take on any real number value, lapping traits can contain useful phylo-
whether a measurement of a morphometric genetic information, whether the charac-
character in a specic individual, or the mean ters are coded quantitatively (Thiele, 1993;
value of an intraspecically variable, quan- this study) or qualitatively (i.e., polymor-
titative trait for a given species (including phic characters; Wiens, 1995, 1998). Further-
meristic characters). Discrete characters are more, the distinction between intraspeci-
those that can take on only a limited subset of cally xed and variable characters may
all possible values; these can refer to charac- be an artifact of small sample size and
ter states (e.g., 0, 1), or raw values for meris- qualitative character denition (few char-
tic traits (e.g., 20 maxillary teeth). In much acters will be intraspecically invariant if
of the systematics literature, however, dis- dened quantitatively). Even though char-
crete is often used to mean characters that acters showing greater intraspecic varia-
show some degree of disjunction between tion tend to be more homoplastic (Archie,
species in ranges of within-species variation 1985; Campbell and Frost, 1993; Wiens,
(i.e., they have nonoverlapping ranges), and 1995), results from real and simulated
continuous is used to mean characters that data sets show thatgiven a nite sam-
show little disjunction. ple of charactersincluding polymorphic
2001 WIENSMORPHOLOGICAL CHARACTER ANALYSIS 691

characters consistently increases phyloge- variation within character state ranges may
netic accuracy relative to excluding them be ignored. For example, given a character
(Wiens and Servedio, 1997; Wiens, 1998). with state 0 consisting of 1114 vertebrae
Third, coding quantitative variation as con- and state 1 of 1520 vertebrae, two hypo-
tinuous quantitative characters (i.e., weight- thetical taxa with species values of 11 and
ing character state transformations on the 14 vertebrae would be coded as identical.
basis of differences in mean trait values us- Second, differences within intervals may be
ing the method proposed in this study) may larger than between intervals. For the above
be preferable to qualitative coding because character, a change from 11 to 14 vertebrae is
it can potentially solve three common prob- ignored, but a change from 14 to 15 receives
lems in morphological phylogenetics. These maximum weight. Third, use of cutoffs and
problems are explained in the sections below. ranges may not reect the differences in the
Vague character denitions.The language amount of change between character states.

Downloaded from http://sysbio.oxfordjournals.org/ at Indian Institute of Science on February 29, 2016


of many character state descriptions is ex- For example, some range-coded characters
tremely vague. For example, states are com- are treated such that all gaps between ranges
monly described simply as wide versus are equal, when this is clearly not the case. If
narrow, small versus large, or long a character is coded such that state 0 D mean
versus short. The problem is that this de- species values from 0 to 2, state 1 D 3 4,
scription is not clear on how a given speci- and state 2 D 1115, a simple unweighted
men is assigned a given state. This problem analysis would not reect the similarity
can be solved by dening the trait quantita- between states 0 and 1 relative to state 2
tively. (see also Fig. 1). Yet, the degree of similarity
Arbitrary character state delimitation. between values found in each species was
Many systematists provide explicit quan- the criterion used in delimiting the states
titative criteria for determining whether a in the rst place. All of the problems noted
given specimen has a given character state above for the use of quantitative cut-offs
for a given character. In many cases, how- are also potentially present in quantitative
ever, it is not clear how the states were de- characters that are treated qualitatively,
limited. Typically, a range of values is given and all of them can be solved by treating
for each state, but usually without expla- quantitative characters as continuous.
nation for why a given set of ranges was Ordering of character states.The question
chosen, or why a given number of inter- of whether or not to order character states is
vals was used (e.g., state 0 D 13 scales, state controversial. For many characters describ-
1 D 46 scales; state 0 D olecranon process ing quantitative variation, systematists gen-
<20% humerus length, state 1 process >40% erally assume that trait values that are similar
humerus length). Similar cutoffs may also be but not identical between taxa can be lumped
dened by using qualitative morphological into the same state. The assumption underly-
landmarks, such as the presence or absence ing this approach, that there is a special sim-
of contact between two features and whether ilarity between taxa with similar trait values,
one structure is longer or shorter than an- also supports analyzing these characters as
other. In some cases, even the presence or ab- continuous variables and provides a logical
sence of a feature (a classic qualitative char- basis for ordering quantitative characters, re-
acter) may be an arbitrary cutoff for a broad gardless of whether they are coded qualita-
range of continuous quantitative variation in tively or quantitatively.
the size of the feature (Poe and Wiens, 2000).
Gift and Stevens (1997) have shown exper-
imentally how different researchers can di-
vide the same quantitative variation in very
different ways, leading to very different char-
acter states. This problem can be solved by
not using a cutoff at all and instead coding
the character as a continuous variable.
Use of cutoffs or ranges may lead to FIGURE 1. Hypothetical data showing differences be-
three additional problems that have not tween size of gaps between mean values of species, and
been widely appreciated. First, considerable the potential importance of gap-weighting.
692 S YSTEMATIC BIOLOGY VOL. 50

O BJECTIONS TO Q UANTIFICATION information on the distance between states,


Most of the putative disadvantages of weighting the changes according to the dif-
quantitative coding are shared with quali- ference between mean species values (hence
tative coding (Zelditch et al., 2000). For ex- the name). For example, an analysis of
ample, systematists may be concerned about the data in Fig. 1 using gap-coding, nite-
how one derives characters from morphome- mixture coding, or overlap-coding might
tric data, and if quantitative traits are cor- reveal evidence for three character states.
related with body size (or with each other), However, given that the degree of similarity
exhibit variation caused by phenotypic plas- between trait values is important and phy-
ticity, or require large sample sizes to be logenetically informative (in fact, it is the
included. However, none of these potential criterion used to delimit states in the rst
problems are created by quantication; they place), changes between states 0 and 1 should
exist independently of whether the charac- be much easier than evolving from either of

Downloaded from http://sysbio.oxfordjournals.org/ at Indian Institute of Science on February 29, 2016


ters are treated quantitatively or qualitatively these states to state 2. Yet, all methods but
(Zelditch et al., 2000). gap-weighting and segment-coding ignore
this information.
Where Thieles method falls short of treat-
CODING CONTINUOUS M ORPHOLOGICAL ing continuous variables as continuous is in
V ARIABLES US ING S TEP-M ATRIX the limited number of states. I propose a
G AP-WEIGHTING method that circumvents this limitation by
Several authors have proposed treating weighting the gaps between mean species
intrinsically quantitative variables quan- values with step matrices. I call this approach
titatively and have developed various step-matrix gap-weighting. For a given char-
methods to do this (e.g., gap-coding: acter, each taxon with a unique mean trait
Mickevich and Johnson, 1976; generalized value is assigned a unique character state,
gap-coding: Archie, 1985; segment-coding: and the costs of changes between these states
Colless, 1980; Thorpe, 1984; Chappill, 1989; are specied with a step matrix, based on
M-coding: Goldman, 1988; gap-weighting: the difference in mean trait values between
Thiele, 1993; nite-mixture coding: Strait each pair of species. The maximum cost be-
et al., 1996; overlap-coding: Swiderski et al., tween states in a step matrix is 1,000 in PAUP
1998). Thiele (1993) proposed gap-weighting (Swofford, 1998), and 999 in MacClade (Mad-
as a method for treating continuous vari- dison and Maddison, 1992); using the largest
ables as more-or-less continuous, by giving value possible allows the most-ne-grained
large weights to large differences in trait weighting. To implement the method for
means between species, and small weights a given character, the mean trait value (x)
to small differences. Thieles implementa- for a given species is converted to a score
tion of gap-weighting involves nding, for a (xS ) between 0 and 1,000 (or 999) by range-
given character, the mean value of the trait standardizing the data according to the fol-
in each species in the analysis, the range lowing formula (from Thiele, 1993)
of mean species values among taxa (i.e., the
species with the greatest mean value and the x min
species with the lowest), and then dividing xS D 1,000
max min
this range into smaller ranges or segments
equal to the maximum number of character where min is the minimum (lowest) mean
states allowed by the phylogenetic software species value of the trait across all species
program (e.g., 32 for PAUP ). Species are and max the maximum. The cost of a trans-
then assigned states based on these ranges, formation between each character state (or
and the character is ordered. Evolving from taxon) in the step matrix is simply the dif-
low to high mean trait values (or vice versa) ference between these scores. A simplied
therefore requires passing through many in- example of this coding method is shown in
termediate states and requires many steps, Figure 2, and a program to implement this
whereas smaller changes in trait values in- method is available from the author.
volve fewer state changes and fewer steps. Analysis of quantitative characters using
An important advantage of the gap- step matrices does have some disadvan-
weighting method is that it incorporates tages, however. First, analyses are potentially
2001 WIENSMORPHOLOGICAL CHARACTER ANALYSIS 693

Downloaded from http://sysbio.oxfordjournals.org/ at Indian Institute of Science on February 29, 2016


FIGURE 2. Hypothetical example showing how a quantitative character is coded with step-matrix gap-weighting.

constrained by the number of distinct states Some systematists may object to step-
allowed by the computer software package. matrix gap-weighting because it requires
This makes it difcult to include very large assumptions about evolutionary processes.
numbers of taxa (>32 for PAUP or PAUP ) However, as stated above, this method is
with unique trait means. If the number of taxa simply a logical extension of the same as-
with unique means is too large, I recommend sumption that is widely used by morpho-
using Thieles (1993) gap-weighting method. logical systematists when they code intrinsi-
This method uses less-ne-grained informa- cally quantitative characters. Morphological
tion but has no limits on the number of taxa character states typically describe ranges of
that can be coded. Second, when using step- trait values, regardless of whether the states
matrix gap-weighting, the only states that are dened quantitatively (e.g., state 0 D
are reconstructed at ancestral nodes are those frontal process length >50% nasal length vs.
that occur within terminal taxa. However, to state 1 D frontal process <40% nasal length)
what extent (if any) this negatively impacts or qualitatively (e.g., frontal process long vs.
tree reconstruction is unclear, and simula- short). Thus, systematists implicitly assume
tions and congruence studies with polymor- that taxa sharing similar but nonidentical
phic characters coded with frequency-based trait values should be more closely related
step matrices do not suggest that this prob- than taxa sharing more dissimilar trait val-
lem limits phylogenetic accuracy (Wiens and ues. They assume that traits will generally
Servedio, 1997, 1998; Wiens, 1998). evolve gradually, rather than leaping from
694 S YSTEMATIC BIOLOGY VOL. 50

low to high trait values and vice versa (i.e., titative characters: between-character scal-
they assume no a priori homoplasy in quan- ing, between-state scaling, and statistical
titative trait values). This assumption is little scaling.
more than an extension of parsimony to char- Between-character scaling.Various au-
acter state denition; the minimum amount thors have recommended weighting or
of change is assumed a priori. This assump- scaling quantitative characters to be equal
tion is also supported by the elds of em- to each other and to qualitative characters
pirical and theoretical quantitative genetics (e.g., Thiele, 1993). The goal is to ensure
(Lynch and Walsh, 1998), which show that a that quantitative and (binary) qualitative
character is generally more likely to evolve characters have the same maximum length,
to a similar trait value (e.g., from a low mean an approach I label between-character
number of ventral scales to a different low scaling. For quantitative characters coded
number) than to a dissimilar value (e.g., from using step matrices with a maximum

Downloaded from http://sysbio.oxfordjournals.org/ at Indian Institute of Science on February 29, 2016


a low to a high number of ventral scales). weight of 1,000, this equal weighting can be
Distance and likelihood methods have achieved simply by giving non-step-matrix
also been developed that can treat continu- characters a weight of 1,000. This seems to
ous morphological data as continuous (e.g., be a reasonable approach, particularly for
Felsenstein, 1981, 1988; Schluter, 1984; Lynch, morphometric characters.
1989), and these methods may be advan- The appropriate scaling for meristic
tageous relative to parsimony under some characters is less clear. When meristic char-
conditions (e.g., Wiens and Servedio, 1998). acters are viewed simply in terms of com-
However, current applications of these meth- paring mean species values, they are clearly
ods do not readily allow for combining qual- continuous traits that are similar to morpho-
itative and quantitative traits, which may metric characters. Under this view, between-
make them difcult to apply to many real character scaling may be most appropriate
data sets. for meristic characters. When we consider
the raw meristic data within species, meris-
tic characters can also be viewed as dis-
S CALING AND WEIGHTING crete characters that are typically polymor-
Q UANTITATIVE CHARACTERS phic and have many states. For characters
Not all characters are readily treated quan- involving the number of serially homologous
titatively, and a morphological analysis may structures, a continuum exists, running from
contain a mixture of characters coded qual- binary characters, to multistate characters, to
itatively and quantitatively (e.g., Chu, 1998; meristic characters; where a character falls on
Gutberlet, 1998; Poe, 1998). How we weight this continuum depends largely on the range
or scale characters of different types rel- of trait values, such that a small range implies
ative to each other is an important issue a small number of states (Fig. 3). This contin-
that has received relatively little discussion uum brings to mind Farris (1990) question:
(Farris, 1990; Thiele, 1993). For example, ex- Should meristic characters be downweighted
plicitly treating the relative length of a bone merely because they have many states? For
as a morphometric character with Thieles example, say that we observe taxa xed for
(1993) gap-weighting method results in 32 vertebral numbers of 10 and 11 among the
ordered character states. Treating the same species of a given group (Fig. 3). This is an
character as a qualitative trait (e.g., long vs.
short) yields only two states. If the charac-
ter is given equal weight relative to quali-
tative characters in both, the weight of the
maximum change in the same character is
31 times greater when treated quantitatively
rather than qualitatively. The problem is even
worse with step-matrix coding; the maxi-
mum length of the character is 1 when treated
qualitatively and 1,000 when treated quanti-
tatively. These dramatic differences in weight
clearly are unjustied. Three approaches FIGURE 3. Hypothetical example illustratin g the con-
might be used to adjust the weight of quan- tinuum from binary to multistate to meristic characters.
2001 WIENSMORPHOLOGICAL CHARACTER ANALYSIS 695

obviously discrete binary character that gap-coding). The statistical methods could
would have the same weight as any other be used to determine the number of distinct
traditional qualitative character. If we also states for each character, and the number
observe taxa xed for 12 and 13 vertebrae in of distinct states minus one could be used
the same group, and we assume the character as a weighting function for each step-matrix
is ordered, then applying between-character coded character. Using this method, the cost
scaling to this character would make the cost of a change between the lowest and highest
(weight) of going from 10 to 11 vertebrae de- mean species trait values would be equiva-
crease to 33% of its original weight (i.e., be- lent to the maximum length of an ordered
cause the cost of going from 10 to 13 is scaled qualitative character; whether it was equiva-
to be equal to the cost of going from 0 to 1 in lent to a qualitative character with two states,
a xed character, the cost of going from 10 to four states, or more would depend on how
11 decreases to one-third). But if the standard many states were determined to be statisti-

Downloaded from http://sysbio.oxfordjournals.org/ at Indian Institute of Science on February 29, 2016


for weighting characters is the change in the cally distinct. The cost of changes between
frequency of adjacent character states from 0 taxa with intermediate trait means would re-
to 100% (i.e., a change from 0 to 1 in a xed main proportional to the difference between
character), then a change from 10 vertebrae trait means, as for all characters coded using
in all specimens of a given species to 13 ver- step-matrix gap-weighting. Thus, a charac-
tebrae in another should instead be equiv- ter in which there are only two distinct states
alent to three changes between xed traits, would receive a weight of one (equivalent to
and should not be downweighted. a xed, binary character). A character with
Between-state scaling.In the context of three statistically distinct states would re-
step-matrix analysis of quantitative vari- ceive a weight of two, such that changes from
ables, what we need is a weighting scheme in the lowest mean to the highest mean would
which transformations between species with have a cost of 2,000; this would be equivalent
xed, adjacent values of meristic variables to two steps, or a change from 0 to 1 to 2 in a
(e.g., 10 to 11 vertebrae) receive the same xed, ordered, multistate, qualitative charac-
weight as changes in binary variables (0 to ter. This weighting scheme, called statistical
1,000), and more variable species with inter- scaling, has the advantage of incorporating
mediate mean values (e.g., 10.5) receive pro- all the relevant information on the distance
portionally intermediate weights. This can between species means, as well as some in-
be accomplished by weighting each meris- formation on the variability of traits within
tic character by the difference between the taxa. However, this approach shares the same
maximum and minimum mean species trait disadvantages of the statistical methods for
values (across all species in the analysis) for character state delimitation. For example, if
that character. I call this approach between- sample sizes are small or there are few gaps
state scaling. An important advantage of this between taxa (despite a large difference in
method relative to between-character scaling range of mean values between species), the
is that the cost of transformation between character will receive little or no weight, even
xed, adjacent trait values (e.g., from 10 to though these same restrictions are not ap-
11 vertebrae) remains constant, regardless of plied to qualitatively coded characters.
the values that occur in other species. In some ways, these three scaling methods
A disadvantage of between-state scaling is do not really represent differential character
that if the range of character state values is weighing. Instead, they represent different
extremely high (e.g., 10 to 200), then char- ways of maintaining equal weights among
acters weighted by this approach may have characters, with each method based on a dif-
a very powerful inuence on the phyloge- ferent concept of what the common currency
netic results. Unfortunately, exactly where of equal weighting should be, namely, over-
one draws the line in this continuum from all character length (between-character scal-
multistate to meristic characters is unclear. ing), transformations between xed, discrete
Statistical scaling.A third approach that states (between-state scaling), or transforma-
might be applied to scaling both morphome- tions between statistically distinct states (sta-
tric and meristic characters is to combine the tistical scaling). The best overall currency is
step-matrix gap-weighting method with the at present unclear. (Note: All three scaling
statistical methods designed for determin- methods are also applicable to data that are
ing distinct states (e.g., nite-mixture coding, gap-weighted by Thieles [1993] method.)
696 S YSTEMATIC BIOLOGY VOL. 50

The uncertainty over the best scaling qualitatively coded data can be inuenced
method, combined with the sensitivity of strongly by a single character, if the author
phylogenetic results to different scaling chooses to divide the character into a large
methods (Fig. 4), might be seen as a seri- number of states (in an analysis in which
ous drawback of treating data quantitatively. all character state transformations are given
But this is a case where quantitative anal- equal weight).
ysis calls for explicit treatment of a gen-
eral problem that is present but typically ig- AN EMPIRICAL EXAMPLE OF
nored with qualitative coding. For example, Q UANTITATIVE CODING AND S CALING
without quantitative methods for delimiting I have recently applied the step-matrix
character states, a phylogenetic analysis of gap-weighting approach outlined in this pa-
per to a phylogenetic analysis of morpho-
logical data in hoplocercid lizards (Wiens

Downloaded from http://sysbio.oxfordjournals.org/ at Indian Institute of Science on February 29, 2016


and Etheridge, unpubl. manuscript). This
is a family of 10 Neotropical species that
are currently divided among three genera
(Enyalioides, Hoplocercus, Morunasaurus). A
total of 46 informative characters (squama-
tion, coloration, osteology) were scored for
10 ingroup taxa and 7 outgroup taxa. Sev-
enteen characters were qualitative and in-
traspecically invariable, 19 were qualita-
tive and polymorphic, 8 were meristic, and
2 were morphometric. Qualitative polymor-
phic characters were coded with the step-
matrix frequency approach (Wiens, 1995,
1999; Berlocher and Swofford, 1997), and
meristic and morphometric characters were
coded with the step-matrix gap-weighting
method described in the present paper.
Two methods for scaling meristic characters
were used: between-character scaling and
between-state scaling. The third method (sta-
tistical scaling) outlined in the previous sec-
tion was not attempted because of the very
small sample sizes available for most species
of hoplocercids, particularly for osteological
characters. The list of characters, the traits
means and frequencies for quantitative and
polymorphic characters, and the coded data
matrix are available as Appendices 13 at
the Society of Systematic Biologists website
FIGURE 4. The impact of different weighting schemes (www.systbiol.org).
for meristic characters on phylogenetic hypotheses Several authors have stated that charac-
for hoplocercid lizards. (a) Between-character scaling
(meristic characters have the same maximum length
ters with extensively overlapping values be-
as xed, binary characters). (b) Between-state scaling tween species (i.e., polymorphic, meristic,
(changes between numerically adjacent, xed, trait val- and morphometric characters) do not con-
ues of meristic characters have the same length as xed, tain useful phylogenetic information and
binary characters). Numbers at nodes indicate boot- should therefore be excluded from phyloge-
strap values >50%. Hoplocercid taxon names are in
bold face. Outgroup taxa include acrodontans (Leiolepis, netic analyses (e.g., Pimentel and Riggins,
Physignathus), polychrotids (Polychrus, Pristidactylus), 1987; Stevens, 1991). I tested whether or
and iguanids (Brachylophus, Ctenosaura, Dipsosaurus). not these three data types contained sig-
Monophyly of acrodontans, polychrotids, and iguanids nicant phylogenetic information relative to
was constrained during tree searches but not the rela-
tionships within or between them. See Appendices 13
random data, using randomization tests on
at www.systbiol.org and Wiens and Etheridge (unpubl. two measures of phylogenetic signal: the g1
manuscript) for further details. index (Hillis and Huelsenbeck, 1992) and
2001 WIENSMORPHOLOGICAL CHARACTER ANALYSIS 697

the consistency index (ci; Kluge and Farris, similar to the only other phylogenetic study
1969). Seven data sets were analyzed: (1) of the group (Etheridge and de Queiroz,
all characters (meristic characters weighted 1988), with the genus Enyalioides forming a
with between-state scaling), (2) all characters paraphyletic series of lineages at the base of
(meristic characters with between-character the tree leading to a clade containing the gen-
scaling), (3) meristic characters only (with era Morunasaurus (which is paraphyletic) and
between-state scaling), (4) meristic charac- Hoplocercus. In the tree based on between-
ters only (with between-character scaling), state scaling, Hoplocercus and a paraphyletic
(5) xed characters only, (6) polymorphic Morunasaurus are at the base, and Enyalioides
characters only, and (7) morphometric char- is a well-supported monophyletic group.
acters only (with between-character scaling). These trees also differ in numerous place-
Each data set was randomized 100 times, by ments of individual species of Enyalioides,
randomly shufing states among taxa within although many branches of both trees (ex-

Downloaded from http://sysbio.oxfordjournals.org/ at Indian Institute of Science on February 29, 2016


a given character (using a program supplied cept for the monophyly of the family and the
by J. P. Huelsenbeck). The number of states branch separating Enyalioides from the other
and the ordering and weighting of states genera) are relatively weakly supported.
and characters were maintained in each of The data sets have signicant phyloge-
the randomized data sets. The ci for each netic structure using both scaling methods
randomized data set was obtained by us- as do the separately analyzed meristic, xed,
ing a heuristic search to nd the shortest polymorphic, and morphometric characters
tree (with tree-bisection-reconnection branch (Table 1). Much of this signal may be concen-
swapping and 20 random addition sequence trated in a few nodes, because many branches
replicates per search). The g1 index for each of both trees are weakly supported. Nev-
randomized data set was calculated by tak- ertheless, the average ci values among the
ing a random sample of 10,000 trees from different types of characters are generally
among all possible trees for that data matrix. similar (Table 2). Regardless of the scaling
For each of the original seven data sets, the method used, the meristic characters have
99% condence interval of the mean g1 index the highest ci values and the morphometric
and ci was calculated for the 100 randomized characters have the lowest.
data matrices. If the observed statistic (for the This study demonstrates that the quantita-
nonrandomized data) fell outside of this con- tive character data for these lizards do con-
dence interval, the data set was considered tain signicant phylogenetic information,
to contain signicant, nonrandom phyloge-
netic information. To conrm that the phy-
logenetic structure occurred within the in- TABLE 1. Results of randomization tests showing
signicant phylogenetic structure in different types of
group, the outgroup taxa were removed from morphological data from hoplocercid lizards, using
all data sets for these analyses. In addition to two statistics ( g1 and consistency index [ci]). The crit-
the analyses of phylogenetic structure, I also ical value refers to the 99% condence interval from
qualitatively compared the average cis of the 100 randomized data matrices. When observed values
different character types (i.e., xed, polymor- for a given statistic fall outside the condence interval
for randomized data, the data are considered to contain
phic, meristic, morphometric) in the trees signicant phylogenetic structure.
from the between-state scaling and between-
character scaling. Step matrices were con- Observed Critical
structed using MacClade, and phyloge- Data type Statistic value value
netic analyses were conducted with PAUP All data (between- g1 0.985 0.214
(version 4.0.0d63). Support for individual state scaling) ci 0.625 0.565
branches was evaluated with nonparametric All data (between- g1 0.925 0.250
character scaling) ci 0.631 0.555
bootstrapping (Felsenstein, 1985; Hillis and Meristic (between- g1 0.559 0.304
Bull, 1993), using 500 pseudoreplicates per state scaling) ci 0.700 0.672
analysis with ve random-addition sequence Meristic (between- g1 0.930 0.530
replicates per bootstrap pseudoreplicate. character scaling) ci 0.692 0.646
Polymorphic g1 0.609 0.240
Different methods for scaling meristic ci 0.577 0.510
characters produced very different trees Fixed g1 0.646 0.428
(Fig. 4). With between-character scaling, the ci 0.621 0.576
tree within hoplocercids is highly incongru- Morphometric g1 1.014
ent with previous taxonomy but relatively ci 0.829 0.778
698 S YSTEMATIC BIOLOGY VOL. 50

TABLE 2. Consistency indices of the different types racy of parsimony, distance, and likelihood
of characters for the trees in Figure 4 (n D number of methods for quantitative traits. Congruence
parsimony-informative characters of each type).
analyses (e.g., Wiens, 1998), which allow
Character n Between-state Between-character phylogenetic accuracy to be addressed with
type scaling scaling empirical data sets, should be particularly
Fixed 17 0.439 0.238 0.308 0.128 useful in this area.
(0.167 1.000) (0.167 0.500)
Polymorphic 19 0.406 0.208 0.316 0.237
(0.167 0.945) (0.142 0.945)
ACKNOWLEDGMENTS
Meristic 8 0.490 0.057 0.349 0.059 I thank Chris Beard, Maureen Kearney, Brad Livezey,
(0.417 0.599) (0.258 0.417) Zhexi Luo, Dick Olmstead, Steve Poe, John Rawlins,
Morphometric 2 0.390 0.110 0.307 0.082 Maria Servedio, Peter Stevens, and John Wible for com-
(0.312 0.467) (0.249 0.365) ments on the manuscript, and Richard Etheridge for use
of our hoplocercid data set for this paper.

Downloaded from http://sysbio.oxfordjournals.org/ at Indian Institute of Science on February 29, 2016


at least when coded with step matrices. The R EFERENCES
desire to avoid continuous variation is one
ARCHIE, J. W. 1985. Methods for coding variable mor-
of the most widely cited criteria for excluding phological features for numerical taxonomic analysis.
characters in morphological phylogenetic Syst. Zool. 34:326345.
studies (Poe and Wiens, 2000), and many au- BERLOCHER, S. H., AND D. L. SWOFFORD . 1997. Searching
thors have condemned the use of overlap- for phylogenetic trees under the frequency parsimony
criterion: An approximation using generalized parsi-
ping quantitative character data in phylo- mony. Syst. Biol. 46:211215.
genetic analysis (e.g., Pimentel and Riggins, BOUGHTON, D. A., B. B. COLETTE, AND A. R. MCCUNE.
1987; Stevens, 1991). Yet, the only other study 1991. Heterochrony in jaw morphology of needle-
to test statistically for phylogenetic structure shes (Teleostei: Belonidae). Syst. Zool. 40:329354.
(or lack thereof) in such data was Thieles CAMPBELL , J. A., AND D. R. FROST . 1993. Anguid lizards
of the genus Abronia: Revisionary notes, descriptions
(1993) study in plants (genus Banksia). Thiele of four new species, phylogenetic analysis, and key.
(1993) also found signicant phylogenetic in- Bull. Am. Mus. Nat. Hist. 216:1121.
formation in the quantitative characters he CHAPPILL , J. A. 1989. Quantitative characters in phylo-
examined, and coded these characters with a genetic analysis. Cladistics 5:217234.
CHU , P. C. 1998. A phylogeny of the gulls (Aves: Larinae)
method (gap-weighting with bins) similar to inferred from osteological and integumentary charac-
the step-matrix approach used in the present ters. Cladistics 14:143.
study. The results also show that levels of COLLESS , D. H. 1980. Congruence between morphome-
homoplasy in the meristic and morphome- tric and allozyme data for Menidia species: A reap-
tric data can be similar to those observed in praisal. Syst. Zool. 29:288299.
ETHERIDGE, R., AND K. DE QUEIROZ. 1988. A phylogeny
the qualitative characters. In fact, the meris- of Iguanidae. Pages 283368 in Phylogenetic rela-
tic characters are the least homoplastic set of tionships of the lizard families: Essays commemorat-
characters in this analysis. The results of this ing Charles L. Camp (R. Estes and G. Pregill, eds.).
study and the study of Thiele (1993) sup- Stanford Univ. Press, Stanford, California.
FARRIS , J. S. 1990. Phenetics in camouage. Cladistics
port the inclusion of overlapping meristic 6:91100.
and morphometric data in phylogeny recon- FELSENSTEIN , J. 1981. Evolutionary trees from gene fre-
struction. quencies and quantitative characters: Finding maxi-
mum likelihood estimates. Evolution 35:12291242.
FELSENSTEIN , J. 1985. Condence limits on phylogenies:
S UMMARY An approach using the bootstrap. Evolution 39:783
791.
Many of the characters used by mor- FELSENSTEIN , J. 1988. Phylogenies and quantitative char-
phological systematists describe variation acters. Annu. Rev. Ecol. Syst. 19:445471.
in continuous, quantitative traits, regardless GIFT, N., AND P. F. STEVENS . 1997. Vagaries in the delim-
of whether these traits are coded quantita- itation of character states in quantitative variation
tively or not. Given this view, there may An experimental study. Syst. Biol. 46:112125.
GOLDMAN, N. 1988. Methods for discrete coding of vari-
be many advantages to treating these char- able morphological features for numerical analysis.
acters explicitly as continuous quantitative Cladistics 4:5971.
characters, and I have proposed new coding GUTBERLET, R. L., JR . 1998. The phylogenetic position of
and scaling methods to implement this ap- the Mexican black-tailed pitviper (Squamata: Viperi-
dae: Crotalinae). Herpetologica 54:184206.
proach. Much work remains to be done on HAUSER , D. L., AND W. PR ESCH. 1991. The effect
testing the accuracy of different coding and of ordered characters on phylogeny reconstruction.
scaling methods and comparing the accu- Cladistics 7:243265.
2001 WIENSMORPHOLOGICAL CHARACTER ANALYSIS 699

HAWKINS , J. A., C. E. HUGHES , AND R. W. SCOTLAND. S CHLUTER, D. 1984. Morphological and phylogenetic
1997. Primary homology assessment, characters and relations among the Darwins nches. Evolution.
character states. Cladistics 13:275283. 38:921930.
HILLIS , D. M., AND J. J. BULL. 1993. An empirical test SLOWINSKI , J. B. 1993. Unordered versus ordered
of bootstrapping as a method for assessing con- characters. Syst. Biol. 42:155165.
dence in phylogenetic analysis. Syst. Biol. 42:182 S MITH, E. N., AND R. L. G UTBERLET , J R. 2001. General-
192. ized frequency coding: a method of preparing poly-
HILLIS , D. M., AND J. P. HUELSENBECK . 1992. Signal, morphic multistate characters for phylogenetic anal-
noise, and reliability in molecular phylogenetic anal- ysis. Syst. Biol. 50:156169.
yses. J. Hered. 83:189195. STEVENS , P. F. 1991. Character states, morphological vari-
KLUGE, A. G., AND J. S. FARRIS . 1969. Quantitative ation, and phylogenetic analysis: A review. Syst. Bot.
phyletics and the evolution of anurans. Syst. Zool. 16:553583.
18:132. STRAIT , D., M. MONIZ, AND P. STRAIT. 1996. Finite mix-
LEE, D.-C., AND H. N. BRYANT. 1999. A reconsideration ture coding: A new approach to coding continuous
of the coding of inapplicable characters: Assumptions characters. Syst. Biol. 45:6778.
and problems. Cladistics 15:373378. STRONG , E. E., AND D. LIPSCOMB . 1999. Character coding

Downloaded from http://sysbio.oxfordjournals.org/ at Indian Institute of Science on February 29, 2016


LIPSCOMB, D. L. 1992. Parsimony, homology, and the and inapplicable data. Cladistics 15:363371.
analysis of multistate characters. Cladistics 8:45 SWIDERSKI , D. L., M. L. ZELDITCH, AND W. L. FINK . 1998.
65. Why morphometrics is not special: Coding quantita-
LYNCH, M. 1989. Phylogenetic hypotheses under the tive data for phylogenetic analysis. Syst. Biol. 47:508
assumption of neutral quantitative-genetic variation. 519.
Evolution 43:117. SWOFFORD, D. L. 1998. PAUP : Phylogenetic analysis us-
LYNCH, M., AND B. WALSH. 1998. Genetics and anal- ing parsimony , Ver. 4.0.0d63. Sinauer, Sunderland,
ysis of quantitative traits. Sinauer, Sunderland, Massachusetts.
Massachussetts. THIELE , K. 1993. The Holy Grail of the perfect character:
MADDIS ON, W. P. 1993. Missing data versus missing The cladistic treatment of morphometric data. Cladis-
characters in phylogenetic analysis. Syst. Biol. 42:576 tics 9:275304.
581. THO RPE, R. S. 1984. Coding morphometric characters
MADDIS ON, W. P., AND D. R. MADDIS ON. 1992. Mac- for constructing distance Wagner networks. Evolution
Clade Ver. 3.0. Analysis of phylogeny and character 38:244255.
evolution. Sinauer, Sunderland, Massachussetts. WIENS , J. J. 1995. Polymorphic characters in phyloge-
MICKEVICH, M. F., AND M. F. JOHNSON. 1976. Congru- netic systematics. Syst. Biol. 44:482500.
ence between morphological and allozyme data in WIENS , J. J. 1998. Testing phylogenetic methods with
evolutionary inference and character evolution. Syst. tree-congruence: Phylogenetic analysis of polymor-
Zool. 25:260270. phic morphological characters in phrynosomatid
NIXON, K. C., AND Q. D. WHEELER . 1990. An ampli- lizards. Syst. Biol. 47:411428.
cation of the phylogenetic species concept. Cladistics WIENS , J. J. 1999. Polymorphism in systematics and com-
6:211223. parative biology. Annu. Rev. Ecol. Syst. 30:327362.
PIMENTEL , R., AND R. RIGGINS . 1987. The nature of WIENS , J. J., AND M. R. SERVEDIO . 1997. Accuracy of
cladistic data. Cladistics 3:201209. phylogenetic analysis including and excluding poly-
PLEIJEL, F. 1995. On character coding for phylogeny re- morphic characters. Syst. Biol. 46:332345.
construction. Cladistics 11:309315. WIENS , J. J., AND M. R. SERVEDIO . 1998. Phylogenetic
POE, S. 1998. Skull characters and the cladistic relation- analysis and intraspecic variation: Performance of
ships of the Hispaniolan dwarf twig Anolis. Herpetol. parsimony, likelihood, and distance methods. Syst.
Mon. 12:192236. Biol. 47:228253.
POE, S., AND J. J. WIENS . 2000. Character selection and the WILKINSON, M. 1992. Ordered versus unordered char-
methodology of morphological phylogenetics. Pages acters. Cladistics 8:375385.
2036 in Phylogenetic analysis of morphological WILKINSON, M. 1995. A comparison of two methods of
data (J. J. Wiens, ed.). Smithsonian Institution Press, character construction. Cladistics 11:297308.
Washington, D.C. ZELDITCH, M. L., D. L. SWIDERS KI , AND W. L. FINK . 2000.
POGUE, M., AND M. MICKEVICH. 1990. Character def- Discovery of phylogenetic characters in morphomet-
initions and character-state delimitations : The bete ric data. Pages 3783 in Phylogenetic analysis of mor-
noire of phylogenetic inference. Cladistics 6:319 phological data (J. J. Wiens, ed.). Smithsonian Institu-
361. tion Press, Washington, D.C.
R AE, T. C. 1998. The logical basis for the use of continu-
ous characters in phylogenetic systematics. Cladistics. Received 7 April 2000; accepted 23 November 2000
14:221228. Associate Editor: A. Brower

You might also like