Professional Documents
Culture Documents
51(5):806–816, 2002
DOI: 10.1080/10635150290102483
Abstract.—A formal method was developed to determine areas of endemism. The study region is
divided into cells, and the number of species that can be considered as endemic is counted for a given
set of cells (D area). Thus, the areas with the maximum number of species considered endemic are
preferred. This is the rst method for the identication of areas of endemism that implements an
optimality criterion directly based on considering the aspects of species distribution that are relevant
to endemism. The method is implemented in two computer programs, NDM and VNDM, available
from the authors. [Biogeography; endemicity; optimality criterion.]
806
2002 SZUMIK ET AL.—OPTIMALITY CRITERION FOR ENDEMICITY 807
Morrone (1994) and Linder (2001) pro- of endemicity. A natural criterion of optimal-
posed more detailed operational procedures, ity is thus provided by counting the species
scoring presence/absence of each species in that can be considered as endemic, given the
each cell of a grid. Both authors proposed to area (and the species distributions). Obvi-
use counts of endemic species as a criterion to ously, from among possible areas, those with
evaluate possible areas, although they were the highest scores of endemicity should be
not completely specic on how to decide preferred.
whether or not a species can be considered To determine how many species appear
endemic. Using counts of endemic species as endemic, endemicity itself must be de-
to select from among all possible conclu- termined for each species in a formalized
sions presents considerable computational manner, which can be done in several ways.
difculties. Instead of using those counts to Four possible criteria have been examined,
select from among all possible sets of cells, from a very strict or ideal concept of en-
Morrone and Linder used them to select from demism (criteria 1 and 2, with a very high
among the sets of cells produced by a par- congruence required between the species
simony analysis or UPGMA clustering us- distribution and the area) to less rigorous
ing the Jaccard similarities. As both Mor- but more realistic requirements (criteria 3
rone and Linder were well aware, not all and 4, which allow for some incongruence).
the species appearing as “synapomorphies” Because each of the criteria is a relaxation of
of a given set of cells will correspond to the preceding one(s), the score under each
endemic species, because they may also be criterion will always be equal to or greater
synapomorphies of many other (not closely than the score under the preceding crite-
related and geographically distant) groups. ria. The data entry is done following the
This possibility violates the main require- steps outlined by Morrone (1994), by plot-
ment for endemicity, that of being restricted ting species localities on a map with a grid,
to the area. Parsimony is indeed an appropri- except that the spatial location of the cell in
ate criterion for phylogenetic reconstruction, the grid (as row, column; see Fig. 3) must
but it cannot be adapted to a eld with com- also be considered. The method has been
pletely different goals and premises. Like- implemented in two computer programs,
wise, UPGMA may prefer groups of cells NDM and VNDM (Goloboff, 2001). NDM
with no endemic species over groups with is the basic search engine, and VNDM is a
several endemics. Thus the counts of species program that helps viewing and diagnosing
endemic to different sets of cells should (e.g., nding out which species contribute to
be used to select from among all possible the score). Optionally, the data can be read as
sets, not only those sets that parsimony or coordinates, and internally converted by the
UPGMA happen to produce.
AN O PTIMALITY CRITERION
A method to determine areas of endemism
based on an optimality criterion must pro-
vide a way to assign a value of endemic-
ity, or score, to a given area (D set of grid
cells) regardless of how that area was found
or hypothesized. For different denitions of
an area, there will be different numbers of
species that can be considered endemic. For
example, a species will satisfy the require-
ment for endemicity if the area comprises
the same cells where it is distributed but FIGURE 3. An area (including ve grid cells) with
will not satisfy the requirement if the area score 2 under criterion 1 and score 3 under criterion 2.
comprises half those cells. Thus, for differ- Under criterion 1, species X, even if occurring in each cell
of the area, does not contribute to the score because it
ent sets of cells, there will be different num- is also found in cells outside the area. Under criterion 2,
bers of species that can be considered as “en- species X contributes to the score. All the cells in the area
demic,” i.e., they will have different scores have identical species composition.
2002 SZUMIK ET AL.—OPTIMALITY CRITERION FOR ENDEMICITY 809
programs to a grid of a specied size. In that before, SI must be identical for all CI in A; if
case, the programs will consider a species some SI 6D SJ , then XA D Ø; otherwise, dene
as present in a cell if it is present in at least
one point (D locality record) inside the cell; B D (SJ \ SK \ : : : \ SN );
a point on the edge (or corner) of a cell indi- V D (Sj [ Sk [ : : : [ Sn ) (for all j, k, and n
cates presence of the corresponding species that belong to NA );
in the two (or four) adjacent cells. Option- XA D B \ » V:
ally, it is also possible to consider each point
as having a “radius” equal to some (user de-
Under this criterion therefore it is not re-
ned) percentage of the cell width or height,
quired that all the species contributing to the
so that a point very close to the edge (or cor-
score have identical distributions. The exam-
ner) of a cell can be considered as also present
ple of Figure 3 will have a score E2 D 3, con-
in the adjacent cell(s).
tributed by the distributions of species X, Y,
As the criteria are dened here, they can-
and Z; X contributes to the score because it is
not be applied to disjunct areas; only areas
found outside the area but only in neighbor-
where all cells are contiguous can be evalu-
ing cells (2–1 and 2–2).
ated. Although it would of course be desir-
able to have a criterion to evaluate disjunct
areas, this rst approximation to the problem Third Criterion (E3 )
does not allow such an evaluation. This criterion is similar to the preceding
For a more explicit denition of the criteria, criterion but drops the requirement that SI
a simple notation is used: must be identical for all CI in A. Thus, it is
not required that all cells in A have identical
AD an area (D set of cells); species composition. However, because XA
CN D nth cell that belongs to A; is determined as with the previous criterion,
Cn D nth cell that does not belong to A; only species occurring in each and every one
NA D set of cells not adjacent to A; of the cells in A will contribute to the score.
SN D set of species present in CN ; Figure 4 shows an example; the area formed
Sn D set of species present in Cn ; by cells 0–1, 0–2, 0–3, 1–1, and 1–2 has a score
XA D set of species that contribute to the E3 D 2 (by species X and Y).
score of area A:
Fourth Criterion (E4 )
In all cases, the score of an area A will be the Under criteria 1 through 3, a species can
cardinality of XA . The complement of a set S contribute to the score only if it is present in
is denoted as »S. each and every one of the cells of the area. A
more realistic criterion, however, must take
First Criterion (E 1 ) into account the fact that a species may be
This criterion assumes that the distribu- absent from a given cell because of poor col-
tion of a species must adjust perfectly to the lecting effort or partial extinction (as in urban
area to contribute to the score. For all CI in
A, SI must be identical; if some SI 6D SJ , then
XA D Ø; otherwise XA D (SJ \ SK \ : : : \ SN )
\ » (Sj [ Sk [ : : : [ Sn ). That is, a species con-
tributes to the score if it is found in the area
and nowhere else, and each of the cells in the
area has exactly the same species composi-
tion. Figure 3 is an example; the area formed
by the cells 0–2, 0–3, 1–1, 1–2, and 1–3 has an
endemicity score E1 D 2.
Second Criterion (E 2 )
This criterion is similar to the preceding
FIGURE 4. An area with score 2 under criterion 3. Not
one, but a species can contribute to the score all cells in the area have identical species composition.
if present in some cell outside the area as long Species X and Y contribute to the score; species Z does
as the cell is adjacent to the area. Thus, like not because it is found in only some cells of the area.
810 S YSTEMATIC BIOLOGY VOL. 51
During the search stage, NDM actually ex- tersection for a set of cells is empty, it follows
amines only the areas having more than a that the intersection of any possible addi-
single cell (those with single cells can be eas- tional combination of cells will also be empty,
ily examined later). To enumerate all possi- and then those additional cells are never
ble combinations of cells, NDM starts with an added. Actually, NDM checks whether the
empty set. To this set, it adds rst cell number partial intersection has fewer members than
0 (upper left corner) and tries all combina- a given minimum score; obviously, searching
tions of the remaining cells together with cell for areas with larger scores speeds up cal-
0. Then it eliminates cell 0, adds cell 1, and culations because it interrupts calculations
tries all possible combinations of the remain- earlier.
ing cells. This procedure is repeated until the For criterion 4, the calculations are more
rst cell included in the set is the one before difcult because a species can contribute to
the last cell in the grid (lower right corner), the score even if it does not occur in each
in which case only one two-cell set can be cell of the area. A good lower bound on the
generated. The possible combinations of cells score can be obtained by calculating an en-
are always examined in the same orderly larged distribution for each species (done be-
fashion. fore the search itself starts and stored in mem-
The procedure described allows genera- ory). For such enlarged distribution, a cell is
tion of all possible sets of cells. Each of the considered as having the species present if
combinations must be evaluated for continu- the species satises the requirements of cri-
ity (disjunct areas are ignored) and, if con- terion 4 in that cell (i.e., actually present in
tinuous, assigned a score under the criterion at least one adjacent cell, absent in no more
(or criteria) in effect. Actual examination of than Q cells). The intersection of the species
each possible combination in this fashion is (with enlarged distributions) in the cells of
extremely time consuming (requiring several a given set of areas will be a superset of the
hours even for small data sets), but many of set of species actually giving a score under
the sets can be implicitly rejected by predict- criterion 4 for that area. Thus, if the number
ing that they will be discontinuous or that of members in the intersection of the species
they will have a low number of endemic occurring in the enlarged distribution in a
species. set of cells is less than the minimum score,
Discontinuous sets will have gaps. For ex- it follows that no set formed by adding fur-
ample, in a grid with eight columns, the set ther cells can have an E4 equal to or greater
formed by cells 0, 1, 5, and 6 is discontinu- than the minimum score. This is true as long
ous. The mere existence of some gap (such as the distributions have been enlarged by
as 2, 3, and 4 in the example) is not enough allowing up to seven empty cells around a
to deduce that any resulting set will be dis- given cell and not checking around cells ac-
continuous; e.g., adding cells 10, 11, and 12 tually occupied. If the number of allowed
to the original set will make it continuous. empty cells is less, the number of species
However, whenever a gap is longer than the contributing to the score can be underesti-
number of columns plus 2, any resulting set mated because a cell may be surrounded by
produced by adding subsequent cells will be some number of empty cells in the full grid
discontinuous (e.g., in an eight-column grid, but by a smaller number when an area is de-
the set formed by 0 and 10 is discontinuous, ned (if the area excludes some of the cells
and no possible addition of cells beyond 10 that did not have the species; only the empty
can make it continuous). When a partial com- cells belonging to the area are counted). Thus,
bination of cells contains such a long gap, all some areas with a positive score (optimal
the sets that result from adding further cells or not) may be missed during the search.
to that partial combination are ignored. The likelihood of missing positive areas in
For criteria 1 through 3, predicting which a given case depends on the relative num-
species can potentially contribute to the score bers of empty cells used to create the en-
is easy because these criteria require that a larged species distributions and to evaluate
species be found in each and every cell of the areas. Allowing up to ve empty cells
the area to count as endemic. As each cell when enlarging species distributions is un-
is added to the set, the intersection of the likely to create errors if the areas are to be
species contained in the new cell with those evaluated allowing up to two empty cells but
previously included is calculated. If the in- is very likely to create errors if the areas are
812 S YSTEMATIC BIOLOGY VOL. 51
to be evaluated allowing up to ve empty are minimal. For larger problems, it is pos-
cells. sible to nd good solutions by constraining
The enlargement of species distributions the search to a given region; only those sets
allowing for fewer empty cells can nd areas contained within the region are evaluated.
that are contractions of the actual optimal ar- The candidate regions can be selected by ana-
eas, i.e., areas that are produced by eliminat- lyzing the data with enlarged grid cells (e.g.,
ing some cells from the actual optimal area. reducing the number of rows and columns
Some of these errors (not necessarily all) will to a half or a third) and then constraining
be remedied if a heuristic addition of cells, the search to the corresponding region of the
one at a time, is done for each of the cells larger data set.
found, retaining (and submitting to the same
procedure) each of the enlarged areas that has
a positive score. FURTHER CONS IDERATIONS
Additional speed can be obtained by iden- A possibility that has not been discussed so
tifying in advance species that cannot con- far is that of conict between the areas with
tribute to the score of a given area by virtue a positive score under some of the criteria. It
of occurring in nonadjacent cells. A cell that is is of course possible, given conicting distri-
columns C 2 positions before the rst cell in a butions, that two sets of cells, where one is a
set and a cell that is columns C 2 positions be- subset of the other, both have positive scores
yond the last cell in the set will by necessity be (under criteria 2 through 4). The one with the
discontinuous (i.e., nonadjacent) to the area. largest score is the one more strongly sup-
For each cell i, a set Fi can be calculated as Fi ported by the evidence. If two partially over-
D Fi¡1 [ Si and a set Bi as Bi D BiC1 [ Si (where lapping areas have the same score, either the
Si is the set of species occurring in cell i); this evidence is ambiguous regarding which of
calculation is done before the search starts. the areas is an area of endemism or both rep-
Then, during the search, if the rst cell in the resent real phenomena. (If each is supported
set is in position i, any species occurring in by the congruent distribution of many taxa,
the set Fi¡(columnsC2) cannot contribute to the the taxa may simply be responding to dif-
score and can be eliminated from the set of ferent factors, such as terrestrial vs. aquatic
species potentially contributing to the score. organisms.) Another possibility is that sev-
(As before, if fewer species than the mini- eral subsets of an optimal area will also have
mum score occur in that set, there is no need some positive score. This result does not re-
to form all the areas that result from adding ally represent conict but simply reects the
further cells to the present set of cells.) Be- fact that some species may have their ranges
fore evaluating a given area, all the species in further contracted. As implemented in NDM,
BjCcolumnsC2 (where j is the last cell of the set) such smaller areas will not be considered;
can be eliminated from the set of species po- the program eliminates them. The situation
tentially contributing to the score. (This saves is different, of course, if the smaller area has
less time than checking against F but still a larger score (under some criteria), in which
saves some time because some areas can be case both areas are saved. Ideally, the com-
rejected easily without further evaluation.) parison should take into account whether the
Because higher minimum scores allow for scores for the larger and smaller areas are
a quicker rejection of many areas, they pro- given by different sets of species, and if so, it
duce faster searches. Using all the shortcuts should retain both areas (this option has not
described above, NDM can analyze data sets yet been implemented).
of medium size in reasonable times. On a Whether an area X in conict with another
266-MHz pentium II machine, the areas with area Y of higher score is reported by the pro-
score ¸2 for a real matrix of carabid beetles, gram or not may depend in turn on whether
with a grid of 10 £ 15 and 33 species (actually area Y itself is in conict with another area
occurring in 42 cells) can be found in 664 sec, (e.g., Z) of even higher score. If so, area Y
the areas with score ¸3 in 1.97 sec, and the must be eliminated (because it loses against
areas with score ¸4 in 0.69 sec. The areas Z), and X will be retained. Thus, NDM can-
with score ¸2 can be found in only 1.05 sec not check for conict between the areas as it
if the enlarged species distributions are cal- nds them. If it did so, nding rst Y, then X,
culated allowing for up to four empty cells, then Z, it would miss area X; when X is found
and the differences from the correct results and compared to Y, it is discarded, and when
2002 SZUMIK ET AL.—OPTIMALITY CRITERION FOR ENDEMICITY 813
Z is found, it discards Y. Only nding X after gure). The three areas of largest score are
both Y and Z are found would produce the the rst three in Figure 7. Area 1 completely
correct result. To avoid this problem, NDM includes areas 8 and 9 (all of lower E4 , but
stores all the areas with positive score that it reported by NDM because they have higher
nds during the search, and only when the E3 ) and is completely included in areas 4, 5,
search is nished does it globally compare all 6, and 7 (all of lower E4 ). Area 2 is in conict
the areas for conict. with area 10 and completely includes area 11
The four criteria for scoring can be used si- (both of lower E4 ). Area 3 is in conict with
multaneously during a search. Because each area 13, is included completely in area 12,
criterion is a relaxation of the preceding and includes area 14 (the three with lower
one(s), the criteria do not actually contradict E4 ). Area 1 of Morrone (1994) is equivalent
each other but give instead complementary to our area 1, and area 2 of Morrone (1994)
information. is equivalent to our area 12 (which is subop-
timal according to our criterion). Area 3 of
A R EAL EXAMPLE: R EANALYSIS OF SCIOBIUS Morrone is equivalent to one of our single-
S CHÖNHERR cell areas. Morrone’s analysis did not recog-
nize any possible equivalent of our area 2 nor
Morrone (1994) analyzed, using parsi- any equivalent of the single cell area N.
mony, a matrix of 47 species of Scio- Even for the areas that appear (identical
bius (Coleoptera: Curculionidae) from South or very similar) in the analysis of Morrone
Africa in a grid with 21 occupied cells. On the (1994), there are signicant differences in the
consensus from 289 optimal trees, Morrone species that dene the areas. Area 1 is diag-
(1994) proposed three areas of endemism nosed under criterion 4 as having 17 endemic
(Fig. 6). Area 1 (cells I, J, L, and M) is dened species (see Fig. 8). Of these 17 species, only 7
by having ve species; there are seven species (6, 7, 10, 12, 22, 23, and 46) appear as synapo-
as synapomorphies of this area, but Morrone morphies of the area when mapped most
indicated only ve, presumably by consider- parsimoniously onto the consensus tree;
ing that only these ve were endemic. Area 2 Morrone (1994) actually showed only
(cells N, O, R, S, and T) is dened by hav- 5 species (he did not show 12 and 23).
ing two species, and area 3 (cell P) is de- (Morrone [1994] mapped the characters onto
ned by having seven species (here, Morrone the consensus tree; we consider that it is
counted only the autapomorphies). The same better to map the individual trees, but we
matrix analyzed under criteria 3 and 4 with use the consensus for comparability with
NDM (allowing for up to two empty cells Morrone’s results.) Some of the species
around each cell in the area) obtained a total contributing to the score under criterion 4
of 16 areas (in 1.17 sec running on a 266-MHz do not appear as synapomorphies under
Pentium II machine), as shown in Figure 7 parsimony because they are not found in
(the two single cell areas, 4-3 and 4-5, N and all the cells forming the area. Species 23
P in Morrone’s grid, are not shown in that (S. marshalli) appears as a synapomorphy
under parsimony, but because it is also
present in the nonadjacent cells A and B
it seems illogical to count it as supporting
endemicity. Thus, for the distribution of
Sciobius, the criteria proposed here produce
more reasonable results than parsimony.
FIG URE 7. The 14 sets with positive E3 or E4 for the data of Morrone (1994).
2002 SZUMIK ET AL.—OPTIMALITY CRITERION FOR ENDEMICITY 815
fragmentation (due to many possible causes) does not. Ideally, species that adjust well to
cannot be recognized as such. It would be de- the expectation of endemicity should con-
sirable to modify the criteria in such a way tribute to the score more than species that ad-
that disjunct areas can be recognized. Modi- just poorly (in a proportion that depends on
cations of the criteria for meaningful eval- how well the species adjust to endemicity).
uation of disjunct areas are currently being A possibility is to weight a species according
investigated. to the proportion of cells in the area that are
Another aspect that should be improved is effectively occupied by the species or by the
the all-or-none aspect of the method; a given ratio of occupied cells inside and outside the
species either contributes to the score or it area, or by both methods.
816 S YSTEMATIC BIOLOGY VOL. 51
Aside from those possible improvements, LINDER , P. 2001. On areas of endemism, with an example
a better insight into the properties of the from the African Restionaceae. Syst. Biol. 50:892–912.
MOILANEN , A. 1999. Searching for most parsimo-
method can be gained by testing the method nious trees with simulated evolutionary optimization.
on randomly generated distributional data. Cladistics 15:39–50.
Another aspect that must be studied more MORRONE, J. J. 1994. On the identication of areas of
closely is the effect of the grid cell size on the endemism. Syst. Biol. 43:438–441.
NELS ON, G., AND P. LADIGES . 1996. Paralogy in cladistic
results (for a brief discussion, see Morrone, biogeography and analysis of paralogy-free subtree.
1994). More detailed analyses along these Am. Mus. Novit. 3167:1–58.
lines are currently being carried out, and NELS ON, G., AND N. I. PLATNICK . 1981. Systematics and
their results will be published elsewhere. biogeography: Cladistics and vicariance. Columbia
Univ. Press, New York.
PAGE, R. D. M. 1994. Maps between trees and cladistic
ACKNOWLEDGMENTS analysis of historical associations among genes, or-
We thank the CONICET (PIP 4974) and FONCYT ganisms, and areas. Syst. Biol. 43:58–77.
(PICT 01-04347 ) for support. Helpful comments from PLATNICK, N. I. 1991. On areas of endemism. Aust. Syst.
James Carpenter, Jonathan Coddington, Peter Linder, Bot. 4:xi–xii.
Roderic Page, Norman Platnick, Mart´õ n Ram´õ rez, and PLATNICK, N. I. 1992. Patterns of biodiversity. Pages 15–
reviewers Juan Morrone, Marco van Veller, and Rino 24 in Systematics, ecology, and the biodiversity crisis
Zandee are greatly appreciated . (N. Eldredge, ed.). Columbia Univ. Press, New York.
PRESSEY, R. L., C. J. HUMPHRIES , C. R. MARGULES , R. I.
VANE-WRIGHT , AND P. W ILLIAMS . 1993. Beyond op-
portunism: Key principles for systematic reserve se-
R EFERENCES lection. Trends Ecol. Evol. 8:124–128.
ALLARD , M., J. FARRIS , AND J. CARPENTER . 1999. Congru- RODRIGUES , A., J. ORES TES CER DEIR A, AND K. GASTON.
ence among mammalian mitochondrial genes. Cladis- 2000. Flexibility, efciency, and accountability: Adapt-
tics 15:75–84. ing reserve selection algorithms to more complex con-
BROOKS , D. R. 1990. Parsimony analysis in historical bio- servation problems. Ecography 23:565–574.
geography and coevolution: Methodological and the- RONQUIST , F. 1997. Dispersal–vicariance analysis: A new
oretical update. Syst. Biol. 39:14–30. approach to the quantication of historical biogeog-
FAITH, D. P. 1992. Conservation evaluation and phylo- raphy. Syst. Biol. 46:195–203.
genetic diversity. Biol. Conserv. 61:1–10. VANE-W RIGHT , R., C. J. HUMPHRIES , AND P. H.
GOLOBOFF, P. A. 2001. NDM and VNDM: Programs for W ILLIAMS . 1991. What to protect—systematics and
analysis of endemicity. Distributed by the author, San the agony of choice. Biol. Conserv. 55:235–254.
Miguel de Tucumán, Tucum án, Argentina. WILLIAMS , P. H. 1996. WORLDMAP 4: Program
GOLOBOFF, P. A. 2002. Optimization of polytomies: State and documentation. Distributed by the author,
set and parallel operations. Mol. Phylogenet. Evol. www.nhm.ac.uk/science/projects/worldmap
22:269–275.
HAROLD , A. S., AND R. D. MOOI . 1994. Areas of en- First submitted 11 December 2001; reviews returned
demism: Denition and recognition criteria. Syst. Biol. 11 June 2002; nal acceptance 22 July 2002
43:261–266. Associate Editor: Roderic Page