You are on page 1of 2

COMMENTARY

DNA barcodes: Genes, genomics, and bioinformatics


W. John Kress* and David L. Erickson
Department of Botany, MRC-166, National Museum of Natural History, Smithsonian Institution, P.O. Box 37012,
Washington, DC 20013-7012

I
t is not a coincidence that DNA
barcoding has developed in concert
with genomics-based investigations.
DNA barcoding (a tool for rapid
species identification based on DNA
sequences) and genomics (which com-
pares entire genome structure and
expression) share an emphasis on large-
scale genetic data acquisition that offers
new answers to questions previously be-
yond the reach of traditional disciplines.
DNA barcodes consist of a standardized
short sequence of DNA (400–800 bp)
that in principle should be easily gener-
ated and characterized for all species on
the planet (1). A massive on-line digital
library of barcodes will serve as a stan-
dard to which the DNA barcode se-
quence of an unidentified sample from Fig. 1. The matrix of genetic information and taxonomic diversity, with DNA barcoding at one extreme
the forest, garden, or market can be (with high species diversity and limited genetic coverage) and genomics (with limited species diversity but
matched. Similar to genomics, which has complete gene description) at the other extreme.
accelerated the process of recognizing
novel genes and comparing gene func-
tion, DNA barcoding will allow users to and soon to be initiated at La Selva Bi- 600 bp in the mitochondrial gene for
efficiently recognize known species and ological Station in Costa Rica), which cytochrome c oxidase subunit 1 (CO1)
speed the discovery of species yet to be will allow the identification of plant tis- (7) has been accepted as a practical,
found in nature. DNA barcoding aims sue fragments in ecological investiga- standardized species-level barcode for
to use the information of one or a few tions as well as quantitative comparisons animals (see www.barcoding.si.edu). The
gene regions to identify all species of of genetic diversity among forest sites. If inability of CO1 to work as a barcode in
life, whereas genomics, the inverse of the barcode marker is conservative plants (8) set off a race among botanists
barcoding, describes in one (e.g., hu- enough (e.g., by including a well suited to find a more appropriate marker (9).
mans) or a few selected species the gene, such as rbcL, in a multilocus bar- A number of candidate gene regions
function and interactions across all code), it will enable the construction of have been suggested as possible bar-
genes (Fig. 1). The work of Lahaye et al. phylogenetic trees for all of the species codes for plants (10–14), but none have
(2) reported in a recent issue of PNAS in a forest, facilitating investigations of been widely accepted by the taxonomic
brings the application of DNA barcod- community structure (3) and functional community. This lack of consensus is in
ing one step closer to implementation in trait evolution (4). The Forest Dynamics part due to the limitations inherent in a
plants. Plot is one of 20 sites located in tropical plastid marker relative to plant CO1,
The deceptively simple task of select- countries (Center for Tropical Forest and also because a quantitative context
ing an appropriate locus to serve as a Science; www.ctfs.si.edu/doc/index.php), for selecting a gene region as a barcode
plant barcode has been much more which taken together encompass nearly for plants has not been offered. Several
complex than expected and has engen- 3.5 million trees representing 12% of all factors must be considered and weighted
dered considerable debate. Despite the known tree species. A complete DNA in selecting a plant DNA barcode: (i)
current lack of consensus on a universal barcode census is now planned for all universal PCR amplification, (ii) range
plant barcode, taxonomists, ecologists, of the woody plants at these sites. The of taxonomic diversity, (iii) power of
evolutionary biologists, and conserva- resultant germplasm bank from this in- species differentiation, and (iv) bioinfor-
tionists are already envisioning the ap- tercontinental application of DNA bar- matic analysis and application.
plication of a genetic identifier to a coding will open up new opportunities Lahaye et al. (2) report tests of the
wide set of research and applied pro- for DNA investigations ranging from various loci and intergenic spacers that
grams. Lahaye et al. (2) point out that community phylogenetics (5) to ecologi- have already been proposed as plant
plant DNA barcodes can be used to as- cal genomics (6). barcodes against their favorite candi-
sess species identification in conserva- To be practical as a DNA barcode a date: the plastid gene matK. Their arti-
tion biodiversity hotspots as well as gene region must satisfy three criteria: cle contains many of the right elements:
hypothetically applied to monitoring the (i) contain significant species-level genetic a diverse sample of taxa in the flowering
international trade in endangered spe- variability and divergence, (ii) possess
cies of orchids. Whole forest species in- conserved flanking sites for developing
ventories based on DNA barcodes are universal PCR primers for wide taxo- Author contributions: W.J.K. and D.L.E. wrote the paper.
also now in progress in both the temper- nomic application, and (iii) have a short The authors declare no conflict of interest.
ate zone (Plummers Island in Maryland sequence length so as to facilitate cur- See companion article on page 2923.
and a park in New York) and the trop- rent capabilities of DNA extraction and *To whom correspondence should be addressed. E-mail:
ics (Forest Dynamics Plot in Panama amplification. A short DNA sequence of kressj@si.edu.

www.pnas.org兾cgi兾doi兾10.1073兾pnas.0800476105 PNAS 兩 February 26, 2008 兩 vol. 105 兩 no. 8 兩 2761–2762


plants, a primer set for matK that in- dition, they report employing several ular locus are independent characters
creases universality, trials of their primer pairs, rather than an optimal sin- that can be combined as a product of
marker on species identification and dis- gle pair, to successfully amplify matK the two values into a single metric for
covery, and the application of barcodes across the samples. For broad universal- comparison.
to important environmental issues. This ity and simplicity of use, matK has not The simple comparative statistic pro-
article is welcome, but as with many of yet been demonstrated to pass the test posed is relevant only when other fac-
the other publications that have pro- for a successful plant barcode. tors are considered, including the effort
posed candidate plant barcodes, the A criterion related to PCR universal- required to recover the PCR amplicon
authors omit quantitative criteria and ity is the relative success of a barcode and the number of different primers and
standards that are necessary to compare marker across the major lineages of land reaction conditions used for sequencing
the success and applicability of their fa- plants, including angiosperms, gymno- each putative barcode locus. The suit-
vored locus against all others. sperms, ferns, and mosses. Lahaye et al. ability of a locus for large-scale DNA
Successful universal PCR amplifica- (2) tested matK only on angiosperms, barcoding could easily be evaluated by
tion across a wide range of plants must explicitly stating that it is not important comparing loci across the same set of
be the primary criterion for selecting a to select a barcode that works success- taxa under a designated set of reaction
DNA barcode. A challenging tradeoff fully across all land plants. In today’s conditions. Although not an explicit
exists between universal PCR amplifi- ecosystems in which the vast majority of measure of how well a DNA barcode
cation and high rates of sequence plants are angiosperms, some might ar- will perform at identifying species
divergence. This tradeoff, which is par- gue that markers should be chosen that within a bioinformatics context, this sta-
ticularly problematic in coding loci, is work best for these dominant land tistic takes into account the intrinsic
less so in noncoding regions because plants. Yet given that the purpose of a tradeoff inherent in a DNA barcode
universal primers are normally found in DNA barcode is to facilitate identifica- marker between the ability to amplify
the highly conserved genes that flank tion of unknown samples, including a locus and the rate of divergence of
the hypervariable intergenic spacers. small isolated fragments of tissue, then that locus across a phylogenetic range
of taxa.
The taxonomic community has wavered the selected loci should work easily on
In conclusion, two final factors that
on setting a level of universal amplifica- all groups of green land plants.
may strongly affect how well barcode
tion (e.g., all land plants or just flower- Conceptually, any consistent, nonzero
markers work at species identification
ing plants?) and the simplicity of PCR sequence variation that distinguishes two
and discovery are database design and
conditions (one primer set for all taxa species should work as a DNA barcode.
sequence search strategies. To date the
or multiple sets across taxonomic Furthermore, DNA barcodes do not re- exact method or algorithm to be used in
groups?) required for a barcode. DNA quire any demonstration of the homol- searching the barcode database has not
barcoding must be practical for a wide ogy of mutations as would be needed in been thoroughly investigated nor debated,
range of practitioners and, therefore, a phylogenetic marker. In other words, particularly as regards a multilocus
the methodology must be accessible and low levels of divergence may be suffi- DNA barcode (14, 17). The algorithms
easily carried out by multiple users. The cient to distinguish among species even used in the most commonly used data-
power of DNA barcoding is also directly if not adequate to estimate phylogenetic bases, GenBank and the Barcode of Life
proportional to the data available in the relationships. Relative to CO1 in ani- Database (BOLD), are quite different.
barcode library; building a very com- mals, the mean divergence level between However, a plethora of additional se-
plete database will greatly increase the species in plants is usually quite low (13, quence alignment methodologies are
power of DNA barcoding (15). These 16). Curiously, Lahaye et al. (2) reject available, which can be evaluated for
considerations require a narrow, stan- the gene region that showed the highest use in DNA barcodes with regards to
dard range of PCR conditions along divergence value (trnH–psbA) in favor of the following: (i) the application of con-
with a limited (ideally one) set of PCR matK, which showed nearly 50% less fidence limits to species assignment, (ii)
primers per locus that will provide a interspecific divergence. As of yet a the use of partial sequences in database
robust barcode marker for the widest quantitative metric that can be used to searches, and (iii) the impact on search
range of taxa and users. Lahaye et al. compare barcode candidates does not algorithms of sequence length variation
(2) purport to have tested their barcode exist. The use of a simple statistic that due to insertion/deletion events and the
loci on the widest sample of taxa so far could be calculated as the product of informative nature of these mutations.
used in any published study. Although the levels of PCR universality and se- Clearly, DNA barcoding has great po-
the number of species is the largest sam- quence divergence would allow for di- tential for enhancing ecological and
ple yet published on plants, 96% of rect comparisons between putative DNA evolutionary investigations if the right
those samples are in a single family, Or- barcode markers. The proportion of genetic markers are selected. The issues
chidaceae. The other samples are spread taxa that are successfully amplified and raised here, if carefully considered and
across 23 families in 18 orders, which is sequenced across a targeted test set to- implemented, will allow a rational selec-
less than half the families and orders gether with the percentage of species tion of a plant DNA barcode based on a
sampled in earlier trials (12, 13). In ad- pairs that are differentiated by a partic- comparative and quantitative analysis.

1. Savolainen V, Cowan RS, Vogler AP, Roderick GK, Lane, 7. Hebert PDN, Ratnasingham S, deWaard JR (2003) Proc 12. Chase MW, Cowan RS, Hollingsworth PM, van den Berg
R (2005) Phil Trans R Soc London Ser B 360:1805-1811. Roy Soc B 270(suppl):S96 –SS99. C, Madriñán S, Petersen G, Seberg O, Jorgsensen T,
2. Lahaye R, van der Bank M, Bogarin D, Warner J, Pupulin 8. Cho Y, Mower JP, Qiu YL, Palmer JD (2004) Proc Natl Cameron KN, Carine M, et al. (2007) Taxon 56:295–299.
F, Gigot G, Maurin O, Duthoit S, Barraclough TG, Savol- Acad Sci USA 101:17741–17746. 13. Kress WJ, Erickson DL (2007) PLoS ONE 2(6):e508.
ainen V (2008) Proc Natl Acad Sci USA 105:2923–2928. 9. Pennisi E (2007) Science 318:190 –191. 14. Sass C, Little DP, Stevenson DW, Specht CD (2007) PLoS
3. Kembel SW, Hubbell SP (2006) Ecology 87:S86 –S99. 10. Kress WJ, Wurdack KJ, Zimmer EA, Weigt LA, Janzen ONE 2(11):e1154.
4. Westoby M, Wright IJ (2006) Trends Ecol Evol 21:261–268. DH (2005) Proc Natl Acad Sci USA 102:8369 – 15. Ekrem T, Willassen E, Stur E (2007) Mol Phylogenet Evol
5. Webb CO, Ackerly DD, McPeek MA, Donoghue, MJ 8374. 43:530 –542.
(2002) Annu Rev Ecol Syst 33:475–505. 11. Taberlet P, Coissac E, Pompanon F, Gielly L, Miquel C, 16. Shaw J, Lickey EB, Schilling EE, Small RL (2007) Am J Bot
6. van Straalen NM, Roelofs D (2006) An Introduction to Valentini A, Vermat T, Corthier G, Brochmann C, Will- 94:275–288.
Ecological Genomics (Oxford Univ Press, London). erslev E (2007) Nucleic Acids Res 35(3):e14. 17. Little D, Stevenson DW (2006) Cladistics 22:1–21.

2762 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0800476105 Kress and Erickson

You might also like