Assessing Statistical Techniques For Detecting Multispecies Samples

Chapter 9: Assessing Statistical Techniques for Detecting Multispecies Samples of
Heteromyids in the Fossil Record: A Test Using Extant Dipodomys

Author(s): MARC A. CARRASCO
Source: Bulletin of the American Museum of Natural History, :120-129.
Published By: American Museum of Natural History
DOI: http://dx.doi.org/10.1206/0003-0090(2004)285<0120:C>2.0.CO;2
URL: http://www.bioone.org/doi/full/10.1206/0003-0090%282004%29285%3C0120%3AC%3E2.0.CO
%3B2
BioOne (www.bioone.org) is a nonprofit, online aggregation of core research in the biological, ecological, and
environmental sciences. BioOne provides a sustainable online platform for over 170 journals and books published
by nonprofit societies, associations, museums, institutions, and presses.
Your use of this PDF, the BioOne Web site, and all posted and associated content indicates your acceptance of
BioOnes Terms of Use, available at www.bioone.org/page/terms_of_use.
Usage of BioOne content is strictly limited to personal, educational, and non-commercial use. Commercial inquiries
or rights and permissions requests should be directed to the individual publisher as copyright holder.
BioOne sees sustainable scholarly publishing as an inherently collaborative enterprise connecting authors, nonprofit publishers, academic institutions, research
libraries, and research funders in the common goal of maximizing access to critical research.
Chapter 9
Assessing Statistical Techniques for Detecting Multispecies

Samples of Heteromyids in the Fossil Record: A Test Using
Extant Dipodomys
MARC A. CARRASCO
ABSTRACT
Sixteen dental measurements in nineteen species of the extant rodent genus Dipodomys
were examined to determine which techniques commonly used to identify the presence of
multiple species in qualitatively homogeneous fossil samples are reliable. Each technique was
tested using a simulation approach whereby samples created from a sympatric pooled-species
group were compared with those of a single-species referent to determine the power of each
technique. The type-I error rate of each method was assessed by comparing simulated pooled
samples created from a single species to the same referent. Most techniques, including all
range-based methods, performed poorly. Only the coefficient of variation using a 1% signifi-
cance level and Levenes test of relative variation using a 2.5% significance level were reliable.
The most useful dental variables were the widths of the upper and lower first and second
molars.
INTRODUCTION primarily used primates as the study group

(e.g., Gingerich, 1974; Cope and Lacy, 1992;
Species recognition in the fossil record is Cope, 1993; Martin and Andrews, 1993;
an ongoing problem. Groups that lack qual- Plavcan, 1993; Cameron, 1998). Thus the
itative differences among taxa, such as ro- generality of those conclusions to other tax-
dents and primates, have forced workers to onomic groups is unclear.
use quantitative variables to address taxo- In an attempt to determine how broadly
nomic questions (e.g., Barnosky, 1986; Mar- applicable the previous studies results are,
tin and Andrews, 1993; Carrasco, 1998). this study investigates which, if any, of the
Many observed distributions of closely relat- most commonly employed species-recogni-
ed sympatric species frequently overlap, thus tion methods are useful in a nonprimate
obscuring taxonomic boundaries and mask- group, the extant heteromyid rodent Dipo-
ing the true taxonomic diversity of a fossil domys. Dipodomys was selected because fos-
assemblage (Plavcan, 1993). To this end, sil samples of heteromyids are composed pri-
more than ten different techniques have been marily of isolated teeth with a simple, con-
proposed, using only quantitative characters servative dental morphology (Wahlert,
(primarily dental characters), to recognize the 1993). This morphology has forced workers
presence of closely related sympatric species to rely on quantitative characters to diagnose
in fossil samples. Despite several attempts to fossil heteromyid species (e.g., Barnosky,
compare select groups of these methods (e.g., 1986; Wahlert, 1993; Carrasco, 1998).
Cope and Lacy, 1992, 1995; Martin and An- Therefore, results obtained here will be di-
drews, 1993; Donnelly and Kramer, 1999), rectly applicable to problematic fossil sam-
considerable controversy exists over which ples. To test the various procedures with ex-
technique is the best. In addition, previous tant taxa, a simulation approach was used
studies on the efficacy of these methods have whereby simulated samples of defined siz-
120
2004 CARRASCO: DETECTING MULTISPECIES SAMPLES 121
es were created from both a multispecies molars (APM2, TM2), upper third molars
sample of closely related sympatric taxa and (APM3, TM3), lower premolars (app4, tp4),
a large single-species sample. These simu- lower first molars (apm1, tm1), lower second
lated samples were then compared to closely molars (apm2, tm2), and lower third molars
related single-species referents to determine (apm3, tm3). All dental measurements were
the reliability of each method. taken through an Ehrenreich Photo-Optical
Industries Shopscope with 10X magnifi-
MATERIALS AND METHODS cation and a precision digital positioner.
Measurements were taken to the nearest 0.01
SAMPLES mm and entered directly into the computer
More than 3700 specimens from 19 dif- program Statistica 5.1h. One hundred speci-
ferent species were analyzed. See Carrasco mens were remeasured to test the accuracy
(1999) for a complete list of specimens ex- of the measurement technique. All variables
amined and locality information. Each spe- displayed an average relative error (the ab-
cies was separated into geographically re- solute error of a measurement divided by its
stricted pooled samples, with the assumption original measurement multiplied by 100) of
that a smaller range of geographic sampling less than 2%.
correlates with less intraspecific variation. In Specimens were placed into one of six age
total, 50 geographically restricted pooled groups (juvenile, 1, 2, 3, 4, or 5) according
samples were created with sample sizes rang- to the criteria used in Carrasco (2000a). Be-
ing from 22 to 132. All pooled geographic cause previous work on the teeth of Dipo-
samples encompassed a range with a radius domys uncovered a significant degree of age
less than 225 kilometers. variation (Carrasco, 2000b), juvenile and
A multispecies pooled sample was created age-5 specimens were eliminated from this
by pooling available specimens of three sym- study. Despite a significant degree of age
patric species from a limited geographic re- variation in the remaining four age groups,
gion (less than a 110 km radius). The mul- all four were maintained to reflect the age
tispecies sample was from northern Baja Cal- variation typically seen in fossil samples.
ifornia and included a total of 274 specimens
of D. gravipes, D. merriami, and D. simulans THE SIMULATION PROCEDURE
(see appendix 9.1). The mean body sizes of 1. Reference standards (the samples to which
these three sympatric species are significant- all others will be compared; see below for
ly different from each other, providing a more information) for each technique and
method to detect multiple species. Despite each variable were selected from a set of 50
geographically restricted single-species
these significant differences in mean, the ob- pooled samples.
served range of each measurement overlaps 2. Using a random number generator, 100 sin-
considerably for each species pair. This study gle-species samples at each of four sample
attempts to answer the question so frequently sizes (n 5 5, 10, 20, and 35) were drawn
unknown in paleontological samplesif a from the geographically restricted single-
random sample were collected from this re- species pooled sample with the largest sam-
gion, would it be possible to determine ple size (D. agilis). Each sample was drawn
whether several species were present based without replacement.
solely on an analysis of tooth size? 3. The three-species sympatric pooled sample
was divided into three groups, each of which
contained a different two-species pair: D.
MEASUREMENTS gravipes/D. merriami, D. gravipes/D. simu-
The dental measurements were those most lans, and D. merriami/D. simulans. Fifty
multiple-species samples were drawn at ran-
commonly taken by paleontologistsmaxi- dom from each of the two-species paired
mum lengths and widths of teeth. In all, six- pools as well as from the original three-spe-
teen dental variables were measured on each cies sympatric pooled sample. This proce-
specimen: length and width of upper pre- dure was repeated at four sample sizes (n 5
molars (APP4 and TP4, respectively), upper 5, 10, 20, and 35) with each sample drawn
first molars (APM1, TM1), upper second without replacement. No attempt was made
122 BULLETIN AMERICAN MUSEUM OF NATURAL HISTORY NO. 285
to equilibrate the different probabilities of estimated by comparing each simulated sin-

selecting one species over another caused by gle-species sample to each of the reference
species sample size differences (see appen- standards. A type-I error occurred when the
dix 9.1). By not correcting for these speci- statistic for a given variable in a simulated
men number differences, a wider range of
relative species percentages in the simulat-
sample exceeded that of a reference standard.
ed samples was created. Tests for the pres- The type-I error rate is the total number of
ence of multiple species in these simulated type-I errors divided by the total number of
pooled-species samples are therefore testing comparisons made. Any variable of a partic-
groups with varying sample sizes of each ular technique with a type-I error rate greater
species as might be encountered in a fossil than five percent is deemed to have a high
sample. type-I error rate.
4. Values for each of the techniques for each Power, the probability of rejecting the null
variable were calculated for all simulated hypothesis when it is not true, was tested by
single-species and pooled-species samples. comparing each of the 800 simulated pooled-
5. Values of the reference standards for each
technique were then compared to the values
species samples to each reference standard.
of the 400 simulated single-species samples The power of a technique for a given vari-
and the 800 pooled-species samples to as- able is the number of times the null hypoth-
sess the type-I error rate and the power of esis was rejected divided by the total number
each technique. of comparisons made multiplied by 100. The
power ranges from 0 to 100 with higher
REFERENCE STANDARDS numbers indicating a more powerful method.
As suggested by Cope and Lacy (1995), a
single reference standard was selected from TECHNIQUES
a variety of samples that are as closely re-
lated and/or ecologically similar to the sim- Univariate methods are by far the most
ulated fossil samples as possible. In this commonly used techniques to evaluate a
study, the reference set of species was com- multiple-species hypothesis (e.g., Gingerich,
posed of the 50 geographically restricted 1974; Cope and Lacy, 1992, 1995; Cope,
pooled samples of Dipodomys. Three meth- 1993; Martin and Andrews, 1993; Plavcan,
ods for selecting a single reference standard 1993; Fuller, 1996; Cameron, 1998). While
(referent) from these 50 samples were tested: multivariate techniques have the potential to
a reference standard composed of the maxi- offer a different, and perhaps improved, ap-
mum values for each variable (e.g., Martin proach toward identifying the taxonomic
and Andrews, 1993; Cameron, 1998), a ref- composition of a fossil assemblage, in prac-
erent composed of the median values for tice, fragmentary fossils and small sample
each variable (Cope and Lacy, 1992, 1995), sizes often limit the utility of multivariate
and a referent using only the values from the methods (Plavcan, 1993; Cope and Lacy,
sample with the largest sample size (largest- 1995). In this paper, only tests that were ap-
n referent; Cope and Lacy, 1992, 1995). Be- plicable to such paleontological samples and
cause the results across all techniques using also could be readily used by investigators
the maximum-value referent were markedly not schooled in advanced statistical methods
worse than those of the other referents, and were evaluated. Thus analyses, such as the
because the median and largest-n referents Fligner and Killeen method advocated by
performed similarly, only the results using Donnelly and Kramer (1999), which require
the median referent will be reported here. For more complex statistical programming, were
a detailed description of the results obtained not evaluated here. For these reasons, this pa-
using the maximum and largest-n referents per tests the coefficient of variation (CV),
see Carrasco (1999). max/min index (MI), range as a percentage
of the mean (R%), and Levenes test of rel-
DETERMINATION OFTYPE-I ERROR RATE AND ative variation (LEV) methods.
POWER The CV is defined as 100*SD/x where SD
The type-I error rate, the rate at which the is the standard deviation of the sample and
null hypothesis is incorrectly rejected, was x is the sample mean. A variety of statistical
methods have been proposed to compare methods has typically been composed of the
CVs (e.g., Lande, 1977; Sokal and Brau- maximum values of a set of reference species
mann, 1980). While most of these tests lack (Martin and Andrews, 1993; Cameron,
power at small sample sizes (Cope and Lacy, 1998). However, as noted above, all methods
1992), the method advocated by Sokal and tested that used a maximum referent per-
Braumann (1980) appears to be a reliable formed poorly. An alternative approach sug-
method to assess the taxonomic composition gested by Cope and Lacy (1995) is to adopt
of a fossil assemblage (Cope and Lacy, a simulation approach with MI and R%,
1992). To compare CVs using this procedure, identical to their CV simulation approach. As
the standard error of the CV is calculated ac- was done with the CV, the results of the sim-
cording to the formula, ulation approach for MI and R% were eval-
uated using 5% and 1% significance levels.
!1 21 21 2
2
V2 n 1 Following Schultz (1985), the LEV pro-
1 2V 11
2n n21 4n cedure consists of transforming each value,
Xi, of a sample of values to Yi according to
where V is the CV/100 of the referent and n the formula, zXi 2 medXz/medX where medX
is the sample size of the fossil assemblage is the median of the set of sample values. An
(Sokal and Braumann, 1980 as suggested by ANOVA is then used to determine whether
Cope and Lacy, 1992). The sample-size bias there are significant differences between the
correction factor (V*) used by Sokal and group means of the extant referent sample
Braumann (1980) was avoided following the and fossil sample of transformed variates, Yi,
recommendations of Cope and Lacy (1992) for a given variable. The greater the variance
and Cope (1993). The standard error of the in the sample the greater the group mean will
CV is then used to create a one-sided confi- be. Comparisons were made at two one-
dence interval for the reference sample that tailed significance levels, 5% and 2.5%.
is compared to the CV of the fossil sample.
A second approach tested here is the sim-
METHODOLOGICAL ASSUMPTIONS
ulation approach of Cope and Lacy (1992,
1995) and Cope (1993). This approach in- Each technique is designed to use quanti-
volves Monte Carlo simulations in which a tative data to detect multiple-species samples
Pascal computer program (Cope and Lacy, by comparing the morphological variation of
1992) creates a 95% confidence interval for the fossil sample with the variation in single-
each CV of the reference group. The CV of species extant taxa. The underlying assump-
the fossil sample is then compared to the cor- tion of this method is that fossil taxa are no
responding confidence intervals. One-tailed more variable than living taxa. While this
tests were employed and evaluated at three uniformitarian approach has been criticized
significance levels (5%, 2.5%, and 1%) for by some authors (Kelley, 1993; Kieser,
the formula-based method of Sokal and 1994), it has been shown by numerous others
Braumann (1980) and two significance levels that there appears to be a consistent pattern
(5% and 1%) for the simulation approach. of metric dental variation across all mam-
Two different range-based methods were malian species (e.g., Simpson and Roe, 1939;
evaluated: MI and R%. MI compares the ra- Gingerich, 1974; Yablokov, 1974; Martin
tio of the maximum value to the minimum and Andrews, 1993).
value of the referent pooled sample for a giv- Implicit in this assumption is that the
en variable to the same ratio of the fossil sources of variation in both the fossil and
sample. If the ratio of the fossil sample ex- extant groups are similar. Temporal, geo-
ceeds that of the referent, multiple species graphical, secondary sexual, and age varia-
are hypothesized to be present. Similarly, the tion can all significantly affect the total mor-
null hypothesis is rejected when R%, defined phological variation of a group. Because it is
as 100*OR/x (where OR is the observed virtually impossible to assign a sex and
range of the sample and x is the sample sometimes an age to individuals in a fossil
mean), of the fossil sample exceeds that of sample in which there might be multiple spe-
the referent. The referent used for these two cies, extant reference groups should contain
both sexes and encompass most age groups played in Tables 9.1 and 9.2. Because these
so that the variation effect of these factors is average values reflect the overall pattern
comparable to that of the fossil sample. To found in each technique, results will not be
limit the effects of geographic variation, a discussed by sample size. Sample-size effects
clear understanding of the geographic range will be addressed in a later publication.
from which the fossil and extant samples are
collected should be obtained. While most TYPE-I ERROR RATE
single fossil assemblages contain individuals
The CV formula type-I error rates were
from a relatively limited geographic region
generally low except those that used a one-
(Martin and Andrews, 1993), it is possible
tailed 5% significance level. This 5% signif-
that a single sample could have been col-
icance method had average type-I error rates
lected from a wide catchment area. In partic-
greater than 5 in nine variables. The 1% for-
ular, fossil groups that consist of multiple
mula method had a low overall average type-
samples from different localities should be
I error rate (2.5), while the 2.5% method rate
tested with caution. If this is the case, it
averaged about 5. The average type-I error
might be best to compare the fossil group to
rate for the 5% CV program method was
an extant reference group from a large geo-
consistently higher than any other CV meth-
graphic range (Albrecht and Miller, 1993;
od tested7 or greater across the majority
Martin and Andrews, 1993). At the same
of variables. The 1% program method had
time, because of the generally limited geo-
lower average values, which were slightly
graphic extent of fossil assemblages, extant
greater than those of the 1% CV formula
comparison groups should be from a limited
method. Across all CV methods, the lowest
geographic range to increase the power of the
rates were seen in the upper and lower pre-
technique. Unfortunately, temporal variation
molars, TM1, TM2, tm3, and apm1.
is difficult to equilibrate between fossil and
The average MI type-I error rates of the
living groups. Although Martin and Andrews
5% method frequently exceeded 5, while
(1993) assert that time-averaging needs to be
those of the 1% technique generally had av-
demonstrated, rather than assumed, every at-
erage rates of less than 2. As seen in the CV
tempt should be made to identify and limit
methods, upper and lower premolar dimen-
the temporal range of the fossil sample.
sions, TM1, TM2, tm3, and apm1 had low
In addition, all of the techniques presented
type-I error rates. Similar to the MI methods,
here can be used to test only a single-species
the 5% R% method had high average type-I
null hypothesis. Previous studies have shown
error rates (average rate of 10.1), while the
that the morphological variation in a multi-
1% method had generally low average rates
ple-species sample is often indistinguishable
of 3 or less. The variables with the lowest
from that of a single-species sample (Cope
type-I error rates were the lower premolar
and Lacy, 1992, 1995; Cope, 1993; Plavcan,
dimensions, APM1, TM1, apm1, and tm3.
1993). Therefore, the presence of a single-
The overall average rate in the 5% LEV
species sample can never be conclusively
method slightly exceeded the 5.0 critical lev-
demonstrated. In addition, falsification of the
el whereas the 2.5% method average was less
single-species null hypothesis does not ne-
than 3.0. All premolar dimensions, TM1,
cessitate that a fossil sample is composed of
TM2, apm2, and tm2 had low type-I error
more than one species (Donnelly and Kra-
rates.
mer, 1999). Falsification indicates only that
the sample has an abnormally high level of
POWER OF THE TECHNIQUES
variation, which might be accounted for by
several causes such as those outlined above. Both the 2.5% and 5% CV formula meth-
Therefore, it is necessary to minimize the ods had an overall average power greater
possibility of such confounding factors. than 30. The 1% method had an average
power of 23.8. The 5% program method was
RESULTS the most powerful CV technique, with 10
The average type-I error rate and power variables having an average power greater
results across all samples sizes tested are dis- than 40. The 1% program method had a pow-
TABLE 9.1
Average Type-1 Error Rate of Each Method by Variable
See Methods section for dental abbreviations. CVF1, CVF2.5, and CVF5, coefficient of variation formula methods using 1%,
2.5%, and 5% significance levels, respectively; CVP1 and CVP5, coefficient of variation program methods using a 1% and 5%
significance level, respectively; MI1 and MI5, max/min index methods using a 1% and 5% significance level, respectively;
R%1 and R%5, range as a percentage of the mean methods using a 1% and 5% significance level, respectively; LEV2.5 and
LEV5, Levenes test of relative variation methods using a 2.5% and 5% significance level, respectively.
TABLE 9.2
Average Power of Each Method by Variable
See Methods section for dental abbreviations and table 9.1 for statistical abbreviations.
er average that was comparable to those of more useful than all others. Previous workers
the 1% and 2.5% formula techniques. Across had found the CV formula and program
all CV techniques, TM2 and tm2 had the methods, using a 95% confidence interval, to
highest powers, with average values greater be quite powerful while committing few
than 40 for all methods. type-I errors (Cope and Lacy, 1992, 1995;
The 5% MI method had an overall average Cope, 1993). While displaying a consider-
power of 22.8, and the 1% method averaged able degree of power, the type-I error rate of
9.6. Slightly more powerful, the 5% R% these methods exceeded the stated 0.05 error
method averaged 31.9 while the 1% tech- rate for many variables, especially the pos-
nique had an average power of 16.8. TM2 terior tooth dimensions. This type-I error rate
and tm2 displayed the greatest powers in contrasts with the low type-I error rate found
both the MI and R% methods. by Cope (1993) and Cope and Lacy (1992,
LEV methods exhibited average power 1995) in their work on cercopithecine pri-
comparable to the 5% R% and MI methods mates, suggesting that the 95% methods do
with an overall average power just less than not always meet the stated error rate of the
30 in the 5% LEV method and around 20 in analyses and are therefore not universally ap-
the 2.5% method. TM2, TM3, tm1, and tm2 propriate. The results of this study also con-
exhibited the most power. firm the suspicions of other workers who
have claimed that the use of the CV to assess
DISCUSSION the taxonomic diversity of a fossil assem-
blage can result in an unacceptably high
Previous work has suggested that the best number of type-I errors (Martin and An-
variables to distinguish closely related sym- drews, 1993; Donnelly and Kramer, 1999).
patric fossil taxa are posterior tooth dimen- This poor performance may be due to com-
sions, in particular those of the first and sec- paring distributions that are similar in shape
ond molars (Gingerich, 1974; Cope and and either normal, leptokurtotic, or strongly
Lacy, 1992, 1995; Cope, 1993; Plavcan, skewed (Donnelly and Kramer, 1999). In a
1993). These conclusions are confirmed in cursory examination of the underlying distri-
this study, with a few notable exceptions. butions, no significant correlation was found
Five of the eight first and second molar di- between the shape of the distributions in the
mensions consistently had powers greater CV type-I error rate analyses and the overall
than 20 and type-I error rates less than 5 CV type-I error rates (this lack of correlation
APM1, TM1, TM2, apm1, and tm2. How- was also found in the R%, MI, and LEV
ever, APM2 and tm1 were among the vari- techniques). However, a more thorough anal-
ables with the highest type-I error rates, ysis of these distributions, which is beyond
while apm2 had lower powers. In addition, the scope of this paper, is needed to reach a
contrary to the findings of previous workers, more definitive conclusion on how the un-
the two variables TP4 and tm3 had powers derlying distribution patterns might have af-
and type-I error rates similar to those of the fected the results. Nevertheless, while there
first and second molars. Another item to note is a slight sacrifice in power, a more reason-
is that the width dimension of every tooth able type-I error rate was recovered in the
exhibited a higher power than the accompa- 99% confidence interval methods.
nying length dimension while the type-I error Results of the range-based program meth-
rates of the two classes of dimensions were ods (MI and R%) were different from those
similar. This result is opposite to that found found in previous studies. Employing the
in primates, where lengths were found to be 95% confidence intervals of these programs,
more powerful than widths (Cope, 1993; Cope and Lacy (1995) found that these tech-
Cope and Lacy, 1995), but in line with pre- niques (using the largest-n referent) produced
vious work on kangaroo rats that found a an acceptable type-I error rate with a power
greater degree of variability in posterior slightly less than that found with the CV pro-
length dimensions relative to width dimen- gram methods. They also concluded that
sions (Carrasco, 2000b). range-based program methods using a me-
Overall, no statistical method was clearly dian referent had an unacceptably high type-
I error rate. In this study, the average type-I the LEV technique had a lower type-I error
error rate of the median referent MI tech- rate. On the other hand, the majority of the
niques was less than that of the largest-n ref- methods, including the range-based statistics,
erent. In addition, the empirical type-I error Levenes 5% technique, and the CV methods
rates of all of the 95% confidence interval that employed 5% significance levels tended
program methods exceeded the stated 0.05 to have low average power (, 10) and/or
rate of the analyses. The techniques that used high type-I error rates and should be avoided
the 99% confidence interval had much lower when testing a single-species hypothesis in
type-I error rates, but the lowest powers of heteromyids.
all methods tested. Overall, the range-based This study also points to a need to select
program methods performed more poorly variables carefully. Many workers have sug-
than the 1% and 2.5% CV formula and pro- gested the use of upper and lower posterior
gram methods. tooth dimensions (P4/p4M2/m2), particu-
The results of Levenes test of relative var- larly first and second molar dimensions (Gin-
iation were promising. These methods dis- gerich, 1974; Cope and Lacy, 1992, 1995;
played a relatively low type-I error rate while Cope, 1993). While the results of this study
exhibiting average powers greater than 20.0. generally confirm these suggestions, there
The 5% method had slightly elevated type-I were particular dimensions that had very
error rates in four of the eight first and sec- high type-I error rates (APM2 and tm1) or
ond molar tooth dimensions, limiting its util- low power (apm2) across all techniques test-
ity. However, the 2.5% LEV average type-I ed. These differences are likely the result of
error rate and power were acceptable and the different taxonomic groups investigat-
comparable to those of the CV 1% program edprevious results were based primarily on
method. The results of this study support the primates. Because of these taxonomic differ-
conclusions of Schultz (1985) and Donnelly ences, a clear understanding of the variation
and Kramer (1999). of each variable in the group being studied
needs to be obtained prior to employing any
CONCLUSIONS AND of the statistical methods. Dimensions that
RECOMMENDATIONS exhibit a low degree of intraspecific vari-
ability and high interspecific variability are
Utilization of statistical methods to detect probably the most reliable dimensions to use
the presence of closely related sympatric spe- (Cope and Lacy, 1992, 1995). For hetero-
cies from a single morphologically homo- myid rodents, the widths of the upper second
geneous fossil assemblage composed solely molar (TM2) and lower second molar (tm2)
of teeth is a common procedure in paleoan- were the most reliable. In addition, while it
thropology (e.g., Gingerich, 1974; Cope, is not wise to test all dimensions available
1993; Martin and Andrews, 1993; Plavcan, (to limit increases in the studywise type-I er-
1993). The purpose of this study was to de- ror rate), at least three or four variables (pref-
termine which, if any, of the most common erably both lengths and widths) should be
statistical methods was the most reliable (i.e., tested using one of the methods employed
displayed an average type-I error rate , 5.0 here due to the possibility of a type-I error
and power . 20.0) within a nonprimate tax- occurring. If only one dimension leads to a
on, in this case the extant heteromyid rodent rejection of the null hypothesis, conclusions
Dipodomys. This study shows that no tech- regarding the taxonomic composition of the
nique tested displayed a type-I error rate that sample should be tempered. Conversely, sev-
consistently matched the stated error rate of eral dimensions that lead to a rejection of the
the analysis while maintaining reasonable null hypothesis provide a solid statistical
power. Therefore, the only techniques that foundation to conclude that multiple species
satisfied the empirical criteria were the CV may be present in a fossil assemblage.
(1% formula and program methods) and Lev-
enes test of relative variation (2.5% meth- ACKNOWLEDGMENTS
od). Of these three methods, the CV program This work was completed as part of my
method was slightly more powerful whereas Ph.D. dissertation from Columbia University
under the guidance of Malcolm C. McKenna. fossil record. Ph.D. dissertation, Columbia Uni-
I thank Malcolm for always allowing me to versity, New York.
pursue my own individual research interests Carrasco, M.A. 2000a. Species discrimination and
while providing valuable advice and guid- morphological relationships of kangaroo rats
(Dipodomys) based on their dentition. Journal
ance along the way. I also thank L.F. Marcus, of Mammalogy 81(1): 107122.
J.H. Wahlert, A.D. Barnosky, B.P. Kraatz and Carrasco, M.A. 2000b. Variation in the dentition
R.S. Feranec for their comments and sugges- of kangaroo rats (genus Dipodomys) and its im-
tions. My appreciation goes to the following plications for the fossil record. Southwestern
curators and collections managers for allow- Naturalist 45(4): 490507.
ing me access to specimens under their care: Cope, D.A. 1993. Measures of dental variation as
F. Brady and R.D.E. MacPhee (American indicators of multiple taxa in samples of sym-
Museum of Natural History, New York), patric Cercopithecus species. In W.H. Kimbel
D.S. Janiger (Los Angeles County Museum, and L.B. Martin (editors), Species, species con-
Los Angeles), L. Abraczinskas and P. Hil- cepts, and primate evolution: 211237. New
York: Plenum Press.
debrandt (Michigan State University Muse- Cope, D.A., and M.G. Lacy. 1992. Falsification
um, East Lansing), B. Stein (Museum of Ver- of a single species hypothesis using the coef-
tebrate Zoology, Berkeley), P. Unitt (San Di- ficient of variation: a simulation approach.
ego Natural History Museum, San Diego), American Journal of Physical Anthropology
L.K. Gordon (Smithsonian Institution, Wash- 89(3): 359378.
ington, D.C.), and G.D. Baumgardner (Texas Cope, D.A., and M.G. Lacy. 1995. Comparative
Cooperative Wildlife Collection, College application of the coefficient of variation and
Station). I would also like to thank M.G. range-based statistics for assessing the taxo-
Lacy for providing copies of his DOS com- nomic composition of fossil samples. Journal of
puter programs to test single-species hypoth- Human Evolution 29(6): 549576.
Donnelly, S.M., and A. Kramer. 1999. Testing for
eses. This research was funded in part by a multiple species in fossil samples: an evalua-
National Science Foundation Graduate Fel- tion and comparison of tests for equal relative
lowship, a Faculty Fellowship from Colum- variation. American Journal of Physical An-
bia University, an American Museum of Nat- thropology 108(4): 507529.
ural History Theodore Roosevelt Grant, and Fuller, K. 1996. Analysis of the probability of
a Ford Foundation Dissertation Fellowship. multiple taxa in a combined sample of Swart-
krans and Kromdraai dental material. American
REFERENCES Journal of Physical Anthropology 101(3): 429
439.
Albrecht, G.H., and J.M.A. Miller. 1993. Geo- Gingerich, P.D. 1974. Size variability of the teeth
graphic variation in primates: a review with im- in living mammals and the diagnosis of closely
plications for interpreting fossils. In W.H. Kim- related sympatric fossil species. Journal of Pa-
bel and L.B. Martin (editors), Species, species leontology 48(5): 895903.
concepts, and primate evolution: 123161. New Kelley, J. 1993. Taxonomic implications of sexual
York: Plenum Press. dimorphism in Lufengpithecus. In W.H. Kimbel
Barnosky, A.D. 1986. New species of the Mio- and L.B. Martin (editors), Species, species con-
cene rodent Cupidinimus (Heteromyidae) and cepts, and primate evolution: 429458. New
some evolutionary relationships within the ge- York: Plenum Press.
nus. Journal of Vertebrate Paleontology 6(1): Kieser, J.A. 1994. Falsification of a single-species
4664. hypothesis by using the coefficient of variation:
Cameron, D.W. 1998. Anatomical variability and a critique. American Journal of Physical An-
systematic status of the hominoids currently al- thropology 95(1): 9597.
located to the African Dryopithecinae. Homo Lande, R. 1977. On comparing coefficients of var-
49(2): 101137. iation. Systematic Zoology 26(2): 214217.
Carrasco, M.A. 1998. Variation and its implica- Martin, L.B., and P. Andrews. 1993. Species rec-
tions in a population of Cupidinimus (Hetero- ognition in middle Miocene hominoids. In W.H.
myidae) from Hepburns Mesa, Montana. Jour- Kimbel and L.B. Martin (editors), Species, spe-
nal of Vertebrate Paleontology 18(2): 391402. cies concepts, and primate evolution: 393427.
Carrasco, M.A. 1999. Morphological variation in New York: Plenum Press.
the dentition of kangaroo rats (genus Dipodo- Plavcan, J.M. 1993. Catarrhine dental variability
mys) and its use in distinguishing species in the and species recognition in the fossil record. In
W.H. Kimbel and L.B. Martin (editors), Spe- cance tests for coefficients of variation and var-
cies, species concepts, and primate evolution: iability profiles. Systematic Zoology 29(1): 50
239263. New York: Plenum Press. 66.
Schultz, B.B. 1985. Levenes test for relative var- Wahlert, J.H. 1993. The fossil record. In H.H.
iation. Systematic Zoology 34(4): 449456. Genoways and J.H. Brown (editors), Biology of
Simpson, G.G., and A. Roe. 1939. Quantitative the Heteromyidae. Special Publication Ameri-
zoology. New York: McGraw-Hill Book Com- can Society of Mammalogists 10: 137.
pany, Inc. Yablokov, A.V. 1974. Variability of mammals.
Sokal, R.R., and C.A. Braumann. 1980. Signifi- New Delhi: Amerind Publishing Co. Pvt. Ltd.
APPENDIX 9.1
SPECIMENS EXAMINED SD 4668, 4669, 46784680, 4684, 4693, 4717,
22210, 22211); 1 mi. S of San Ramon (SD 4912,
The following is the locality information for the 4913, 4924, 22209); N end of San Quintn Plain
multispecies sample of Dipodomys (274 total (SD 49374939, 49604962, 22206, 22208); San
specimens) from northern Baja California. All Quintn Plain (SD 5042); Santa Mara near San
specimens were complete skulls. Information is Quintn (SD 8533, 1851918521); 7 mi. SE of
arranged alphabetically by species followed by San Quintn (SD 15800); 10 mi. E of San Quintn
museum. Abbreviations: AMNH, American Mu- (SD 1857618578, 1858718589, 18604, 18617,
seum of Natural History, New York; LACM, Los 1861918622); NE side of San Quintn Bay (SD
Angeles County Museum; MVZ, Museum of Ver- 1953419543); 2 mi. S of Old Mill on the N side
tebrate Zoology, Berkeley; SD, San Diego Natural of San Quintn Bay (SD 19606); near Rock Bluff,
History Museum; USNM, United States Natural 8 mi. N of Cape San Quintn (SD 19607, 19608);
History Museum; n, sample size. 7 mi. N and 0.5 mi. W of Cape San Quintn (SD
Dipodomys gravipes (n 5 62): Baha de San 20074).
Quintn (LACM 33433); Colonia Guerrero, across Dipodomys simulans (n 5 127): Colonia Guer-
WNW from Red Cliffs, Santo Domingo entrance rero, across WNW from Red Cliffs, Santo Do-
(LACM 32111, 32113); near entrance to Santo mingo entrance (LACM 32109, 32110); Santo
Domingo Canyon, Colonia Guerrero (LACM Domingo (LACM 1171; MVZ 36227, 36229; SD
38174); Agua Chiquita, 4 mi. E of San Quintn 4671, 46734675, 46894692, 4695, 4698, 4705
(MVZ 35704); San Quintn (MVZ 36234; SD 4708, 4714, 4886, 4901, 4903, 2233522341);
15953; USNM 138910); San Ramon, mouth of Valladares (MVZ 3568135687); San Antonio
Santo Domingo River (MVZ 35655, 35657 Ranch, Santo Domingo River (MVZ 35689); San
35659, 3566135664, 35666, 36214, 36233); So- Jose (MVZ 35695, 35696, 36073, 3607536082);
corro (MVZ 49860, 49863); Socorro, 20 mi. S of Colnett (MVZ 36088); San Quintn (MVZ 49831,
San Quintn (MVZ 49854, 49856, 49858); Santa 49832; SD 4992, 5003, 5005, 5007, 5030, 5031,
Mara near San Quintn (SD 8532, 18513); 3 mi. 15956, 15957, 22333, 22334; USNM 138909,
S of San Telmo (SD 15821, 15822); San Quintn 139819, 139820, 139822, 139823, 139826,
Plain (SD 4997, 4999, 50235026, 50355037, 245886); San Telmo (MVZ 3566835678, 35680,
2234722350); Aqua Chiquita Canyon (SD 35921, 36219, 36220, 36222, 3622336225;
22346); 1 mi. S of San Ramon (SD 4906); Santo USNM 139830); Socorro, 20 mi. S of San Quintn
Domingo (SD 4682, 4683, 4704, 4715, 4823, (MVZ 49861, 49862); 2.4 km S and 5.0 km W
4885, 4941, 4945, 4946, 4950, 4976, 22351 of Mission Santo Domingo (MVZ 153965,
22356; USNM 245884); mouth of Agua Chiquita 153966, 153968, 153969); 2.4 km W of Mission
Canyon, San Quintn Plain (USNM 245885). Santo Domingo (MVZ 153960153964); 5 mi. W
Dipodomys merriami (n 5 85): Between El So- and 1.25 mi. S of San Telmo de Abajo (MVZ
corro and El Counsuelo, Hwy. 1, Arroyo San 148091148100); 1 mi. S of San Ramon (SD
Quintn (LACM 38172); Aqua Chiquita, 4 mi. E 4907, 4925); N end of San Quintn Plain (SD
of San Quintn (MVZ 35702); Arroyo Nueva 4958); Santa Mara near San Quintn (SD 18514);
York, 15 mi. S of Santo Domingo (MVZ 36244, 10 mi. E of San Quintn (SD 18585, 18603,
36245, 36247); San Quintn (MVZ 49880, 49882 18615, 18616); NE side of San Quintn Bay (SD
49884, 49886, 49887, 4988949894; SD 1218, 19533); near Rock Bluff, 8 mi. N of Cape San
4995, 4996, 15955, 22207; USNM 138911, Quintn (SD 19605); Aqua Chiquita Canyon near
138914, 138915, 138917, 138921, 139827, San Quintn (SD 2232722332); 20 mi. SE of San
139829); Santo Domingo (MVZ 36236, 36237; Telmo (USNM 528823).

Assessing Statistical Techniques For Detecting Multispecies Samples

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assessing Statistical Techniques For Detecting Multispecies Samples

Uploaded by

Copyright:

Available Formats

Chapter 9: Assessing Statistical Techniques for Detecting Multispecies Samples of

Heteromyids in the Fossil Record: A Test Using Extant Dipodomys

Assessing Statistical Techniques for Detecting Multispecies

INTRODUCTION primarily used primates as the study group

to equilibrate the different probabilities of estimated by comparing each simulated sin-

You might also like