Professional Documents
Culture Documents
John-James Wilson, University of South Wales, Pontypridd, United Kingdom and Naresuan University, Phitsanulok, Thailand
Kong-Wah Sing, Kunming Institute of Zoology, Chinese Academy of Sciences, Yunnan, P.R. China
Narong Jaturas, Naresuan University, Phitsanulok, Thailand
r 2018 Elsevier Inc. All rights reserved.
Introduction
Just as species show differences in their morphology, ecology, and behavior, they also show differences in their DNA sequences
(Wilson et al., 2017). “DNA barcoding,” used in a broad sense, refers to the use of short, standardized DNA sequences as markers
for the recognition of species. When used more precisely, “DNA barcoding” refers to the technique of sequencing a short fragment
of the DNA sequence of the mitochondrial cytochrome c oxidase subunit I (COI) gene, the animal “DNA barcode,” from a
taxonomically unknown specimen and performing comparisons with a library of DNA barcodes from taxonomically known
specimens to establish a taxonomic identification.
DNA barcoding requires basic molecular biology methods (which pre-date the term DNA barcoding) to extract and amplify the
DNA barcode sequence fragment from the unknown specimen (Fig. 1). Generally, this sample of amplified DNA is then passed to
commercial companies for inexpensive Sanger sequencing, or, particularly in the case of mixed, bulk samples, for high-throughput
(next generation) sequencing. These molecular biology methods are not covered in this article which focuses on bioinformatics
workflows following the receipt of digital DNA sequences from a sequencer (Fig. 1). For a step-by-step guide to the molecular
biology methods used for standard (animal) DNA barcoding see Wilson (2012) and for an approximation of costs in developing
countries see Sing et al. (2016). Brandon-Mong et al. (2015) provide an example of a method (bulk extraction and bulk PCR) for
use prior to high-throughput sequencing, i.e., DNA metabarcoding.
The ultimate aim of a DNA barcoding bioinformatics workflow is to (1) produce a “clean” or “reliable” digital representation of
the DNA barcodes, and (2) use these DNA barcodes to obtain information about the taxonomy of the unknown specimen through
algorithms enabling DNA sequence comparisons, in conjunction with DNA sequence libraries (Fig. 1).
Background/Fundamentals
It is important to realize that full exploitation of DNA barcodes for species recognition will only be possible after assembling a
comprehensive library linking organisms (and Linnaean taxonomy) with their DNA barcodes (see Wilson et al., 2017 for an
assessment of DNA barcode library coverage for insects). With this in mind, in an effort to promote DNA barcoding research
and DNA barcode library building (e.g., Wilson et al., 2013), we have been organizing and facilitating DNA barcoding workshops
in Southeast Asia, a mega diverse region with relatively low DNA barcode coverage in public libraries (Wilson et al., 2016). We
have created a website, DNA Barcoding Workshops (DBW), as a companion resource to our workshops and much of the material
in this article derives for our experience running these workshops.
This article is intended as a guide for absolute beginners to DNA barcoding, and provides step-by-step instructions for basic
bioinformatics workflows following receipt of DNA sequences from a DNA sequencing facility.
Application
a) Usually your sequences will come back from the sequencing company by email in a zip (folder) file. An example of a zip file
containing 12 trace files (6 each trace files for forward and reverse) (6 unknown specimens), representing the kind of zip file
you could receive from a sequencing company, is provided for download as Supplemental File 1.
b) Unpack (extract) the zip file to your desktop. This should create a regular folder on your desktop, which you can name Traces.
There are two sets of files for each sequence (e.g., NUMBEROFSAMPLE_F.ab1 and NUMBEROFSAMPLE_F.txt). The files you
are interested in have an extension .ab1 (e.g., NUMBEROFSAMPLE_F.ab1 and NUMBEROFSAMPLE_R.ab1). Delete the other
files in the folder.
For sequence editing we recommend the program CODONCODE ALIGNER which is used at the Canadian Centre for DNA
Barcoding (CCDB). Information on CODONCODE ALIGNER, including a free trial version, can be found at the Codoncode
Corporation website. The following steps describe the process of Sanger sequence editing for multiple specimens whose DNA
barcodes were amplified using the primers LCO1490 and HCO2198 (see Wilson, 2012; Brandon-Mong et al., 2015). For practice
you can go through these steps using the example files in Supplemental File 1. For other primer sets adapt accordingly.
a) Open CodonCode Aligner and choose Create a new project and press OK.
b) Go to File4Import4Add Folder... navigate to the desktop and select the folder of traces [which should be named Traces if you
followed the suggestion above]. Click Open4Import.
c) To see the files you just imported press ► besides the Unassembled Samples folder.
d) The .ab1 files should be of the form NUMBEROFSAMPLE_F.ab1 where the second part “F” refers to the direction, i.e. Forward.
e) Sort the files by quality by double-clicking on Quality. Any sequences that are of very poor quality (look for a big difference
between the sequence length and the quality score; a higher quality score is better) can be deleted by highlighting the
sequence and clicking Edit4Move to Trash.
f) Next we will group our sequences by direction for easy editing.
DNA Barcoding: Bioinformatics Workflows for Beginners 3
g) Make sure the Unassembled Samples folder is highlighted. Select the Contig menu and move the cursor over Advanced Assembly.
From the options that appear select Assemble in Groups.
h) A new window will appear. Click the button Define name parts...
i) There are two name parts to our file names (see above). The first part of our file names refers to the number of the sample and
for our purposes the option in the Meaning menu (first row) can be left as Clone. Since the sample number is followed by an
underscore, choose _ (underscore) in the Delimiter menu next to Clone (if it is not already selected).
j) For the second row choose Direction in the Meaning menu. We can ignore the Delimiter menu for the Direction part because
there is nothing following the direction in our file names.
k) Delete all additional name parts that may appear in the window (if any), and next click Preview... to check how CodonCode
Aligner is interpreting the sample names.
l) Click Close to exit the preview. Click OK to return to the Assemble in Groups window.
m) We first want to assemble our samples according to direction. Choose Direction in the Name part: dropdown menu. Then click
Assemble. You should now have two folders, one called F with the forward sequences and one called R with the reverse
sequences. [Note: if you only sent your PCR products for sequencing in one direction, i.e., with one primer, then you will only
have one folder.].
n) We will deal with the reverse sequences first. The first step is to reverse complement the sequences. Highlight the R folder,
select Edit4Reverse complement.
o) Next we need to cut the primer from the sequence. Double click the R folder to open it. For the reverse sequences, you need to
find the forward primer motif and delete it from the beginning of the consensus sequence at the bottom of the window. You
will find the primer around 30 nucleotides from the end of the raw sequence. For example, you would need to delete the
section of the sequence marked below in bold and everything to the left of it. Highlight it on the consensus sequence at the
bottom of the window and press the Backspace key on the keyboard. ←AAAGATATTGGAACATTATATTTTATTTTT...
p) Next go to the opposite end of the consensus, the far right. Delete the consensus sequence from the point where the sequence gets
messy. This will be apparent due to lots of green highlight. For example [it will not look exactly like this], delete the section marked
in bold and everything to the right of it. Highlight it on the consensus sequence at the bottom of the window and press the Delete
key on the keyboard. Close the window. ... TCTTTTTTTGACCCTGCTGGTGGAGGGTTTGGTAGGAGGATG-
q) Double click the F folder to open it. Go to the far right of the consensus sequence and find the reverse complement reverse
primer motif at the very end. This should be around 650 bp on the raw sequence. For example, you would delete the section
marked in bold and everything to the right of it. Highlight it on the consensus sequence at the bottom of the window and
press the Delete key on the keyboard. ... CAACATTTATTTTGATTTTTTGG-
r) Next go to the opposite end of the consensus, the far left, and delete the consensus sequence from the point where
the sequences get messy. This will be apparent due to lots of green highlight. For example [it will not look exactly
like this], you would delete the sequence in bold and everything to the left of it. Highlight the region on the consensus
sequence at the bottom of the window and press the Backspace key on the keyboard. Close the window.
←ATGCTTTTTTTTTKGGTGTTTAATCAGGACTAATTGGAACTTC
s) Dissolve both the F and R folders by highlighting them and clicking the button marked with a red X.
t) Now we are going to combine the forward and reverse sequence from each specimen into a contig. Highlight the Unassembled
Samples folder and open the Contig menu. Move the cursor over Advanced Assembly. From the options that appear select
Assemble in Groups. This time choose Clone in the Name part: menu, then click Assemble. [Note: if you only sent your PCR
products for sequencing in one direction (with one primer) then you will need to check each sequence individually rather
than checking a consensus (contig).][Note: specimens which only sequenced successfully in one direction will have files
which remain in the Unassembled Samples folder.]
u) The contigs are likely to be in reverse complement orientation. Highlight every folder (contig), select Edit4Reverse complement.
v) Open each folder (contig) in turn by double-clicking. Correct ambiguous positions (shown in red, in green highlight, and/or
as N) and gaps (“—“) in the consensus sequence by checking the original traces. This is done by double-clicking on the
consensus sequence at the bottom of the window. Always check both trace files (forward and reverse) and compare them.
[Note: the corrected consensus sequence should have NO gaps.].
w) Generally if traces conflict (i.e., different colored peaks appear in the same location on the forward and reverse chromato-
grams) you can decide which is more reliable based on sequence quality (e.g., less background noise, taller peaks).
x) Check the contigs first, then check the individual single sequences in the Unassembled Samples folder, if any.
y) To export the consensus sequences, highlight all the folders, go File4Export4Consensus Sequences..., choose Current selection.
Open the Options and select Include gaps in FASTA but deselect all other options. Press Export. Save the file to the desktop as
sequences.fasta.
z) If necessary, to export single direction sequences, go File4Export4Samples..., choose Current selection. Press Export. Save the file
to the desktop as sequences_single.fasta.
onboard the sequencer (e.g., onboard the MiSeq using the MiSeq Reporter software). Because the FASTQ outputs are large files, the
sequencing company probably will not send the files by email but will email you a link to a website from where you can download
the files, usually (like Sanger sequences), packed into a zip file. Two FASTQ files (Paired-end files) are output from each sequencing
run, which you can think of as the Forward and Reverse sequences.
The following workflow describe steps taken for processing high-throughput reads for bulk arthropod samples whose DNA was
amplified using the primers mlCOIintF and HCO2198 (see Brandon-Mong et al., 2015). A zip file containing some example FASTQ
Paired-end files are available for download as Supplemental File 2. For practice you can go through these steps using these example
files. For convenience, save the files in a folder (called Reads) on your Desktop.
It is important to note that the steps provided below are crude methods for processing a very small number of high-throughput
sequencing reads. The field of DNA metabarcoding is a relatively new field and much work is being undertaken to develop
methods to reduce the number of “spurious” reads generated and retained by bioinformatics pipelines for high-throughput
sequencing reads (Brandon-Mong et al., 2015) (see Future Directions below). For FASTQ files which are larger than the example
provided as Supplemental File 2, CODONCODE ALIGNER is probably not a suitable program, and for beginners, it may be better
to register and use applications provided on the GALAXY webserver. Considering that DNA metabarcoding is the focus of another
article in this book, we do not provide additional details here.
a) Open the PRINSEQ webserver (Schmieder and Edwards, 2011), click Use PRINSEQ and click on Upload Data.
b) Your files are FASTQ Paired-end so choose that option.
c) Select the two FASTQ files you have saved on your Desktop (in the Reads folder).
d) Under Please select the statistics you want to generate. Choose None for all options then click Continue.
e) Wait while PRINSEQ processes your data.
f) Once it is finished click Process Input Data.
g) Choose the options from Table 1.
h) Choose to Output the data as FASTA, Data passing all the filters (good).
Sequence Alignment
The alignment of DNA barcode sequences is a necessary step before two or more DNA barcode sequences can be compared with one
another. Sequence alignment is the process of lining up nucleotides which are assumed to have the same common ancestor
(i.e., thought to be homologous). BIOEDIT is the most commonly used program for small-scale sequence alignment, and is free for use
by any and all interested parties. BIOEDIT can be downloaded from the program website but is no longer being regularly maintained.
GenBank BLAST
a) Click Nucleotide BLAST on the BLAST homepage, paste the text from your FASTA file into the box Enter accession number(s), gi
(s), or FASTA sequence(s), and make sure the Database selection is Others.
b) BLAST pre-dates DNA barcoding, and is used for a variety of purposes, so the output is a little more difficult to interpret. Like
BOLD, a list of library records is displayed, generally with the closest matching library sequence (i.e., the highest % Identity) at the
top. Four other statistics are supplied: Max score indicates the highest alignment score (bit-score) between the query DNA barcode
and the library sequence segment (the higher the better, 1000 is very good); Total score and Query coverage are generally not
applicable for protein-coding genes such as the animal DNA barcode; E-value is the most important statistic for DNA barcoding
and indicates number of alignments expected by chance with a particular score or higher (the closer to 0 the better). An example
of a BLAST search result is shown in Fig. 3(b) where the sequence can be conclusively identified as Amauthuxidia amythaon.
DNA Barcoding: Bioinformatics Workflows for Beginners 7
Fig. 3 Examples of the results of sequence identification requests using (a) BOLD identification engine and (b) GenBank BLAST.
Neighbor-Joining in MEGA
Molecular Evolutionary Genetics Analysis (MEGA) program is an extremely popular program for tree-building and is free to
download from the program website (Tamura et al., 2013). Once a Neighbor-Joining (NJ) tree has been built, DNA barcodes
can be sorted to OTU ad hoc based on the tree branching pattern (topology) and branch lengths (see the NJ tree and discussion in
Sukantamala et al., 2017). Note that NJ trees are not technically “phylogenetic” trees. Phylogenetic trees are constructed on the
basis of synapomorphies, whereas NJ trees are phenetic trees, constructed on the basis of sequence similarity. DNA barcoding is
concerned with relationships amongst sequences at the “species” boundary and not the reconstruction of phylogeny, so generally
NJ trees are sufficient for the analysis of DNA barcodes.
a) From the bPTP homepage, click the Browse… button to locate and upload your NJ tree (select the Newick file, e.g., tree.nwk, not
the image file).
b) Leave all the settings as the defaults and enter your email address.
c) Click to refresh until the results appear. Two trees are displayed (a maximum likelihood solution and a Bayesian solution) but
they are likely to be the same topology.
d) Click Download delimitation results. The page that opens shows a list of groups (species) and the DNA barcodes contained
within them.
We have used DNA barcoding in several biodiversity studies in Southeast Asia (Wilson et al., 2016) covering a wide range
of animal groups including butterflies (Jisming-See et al., 2016), bats (Syaripuddin et al., 2014), sandflies (Polseela et al., 2016;
Sukantamala et al., 2017) and dragonflies (Casas et al., 2017). Two illustrative examples are provided below.
DNA Barcoding: Bioinformatics Workflows for Beginners 9
Discussion
DNA barcoding is being used by researchers across an increasing number of biological fields reflecting the fact that DNA sequence
information can be cheap and easy to obtain and can enable assignment of taxonomic names to organisms without requiring
researchers to be familiar with intricate morphological features. Likewise, molecular OTU can be suitable surrogates for parti-
tioning diversity into interoperable units for biodiversity studies enabling researchers to obtain taxonomic data much faster than
possible with traditional morphological approaches, making studies scalable across much larger taxonomic groups and wider
geographical regions (Wilson et al., 2017). Yet, the prospect of DNA barcoding can be daunting for beginners (Wilson et al., 2016).
Mastering a basic bioinformatics workflow is essential to ensure the quality and reliability of data and to generate meaningful
results (Wilson and Sing, 2013).
Future Directions
High-throughput sequencers are replacing Sanger sequencers in most molecular applications. The development of the sub-
discipline of DNA metabarcoding grew directly from the major advantages offered by high-throughput sequencing through
circumventing the sorting and isolation of the thousands of individuals in bulk mixed samples of organisms (Brandon-Mong et al.,
2015). However, the short read lengths and high error rates limited the use of high-throughput sequencers for conventional,
individual specimen, DNA barcoding (Hebert et al., 2017), and in particular, for DNA barcode reference library construction (Liu
et al., 2017). The continued reliance on Sanger sequencing has constrained reductions in the cost of DNA barcoding and led to
uneven DNA barcoding efforts around the world (Liu et al., 2017). Recently developed approaches using the latest generation of
high-throughput sequencing platforms (Pacific Biosciences SEQUEL and Illumina HiSeq 4000) have produced full length DNA
barcodes of equivalent length and quality to those generated by Sanger sequencing, but with substantially reduced costs of 10-fold
(Liu et al., 2017) to 40-fold (Hebert et al., 2017). Bioinformatics pipelines to complement these new approaches have developed
concurrently. The mBRAVE webserver developed by the team behind BOLD, and with direct links to the BOLD reference libraries
(Hebert et al., 2017) is an important development and likely represents a landmark shift in standard DNA barcoding protocols.
Closing Remarks
In this article, we provide beginners with step-by-step instructions for converting raw DNA sequences into clean DNA barcodes
(sequence editing, sequence alignment), to commonly used tools for assigning taxonomic names to DNA barcodes, and to cluster
DNA barcodes into OTU. As more researchers become comfortable with such bioinformatics workflows, and the DNA barcoding
community continues to grow, essential questions for society: “What is this specimen on an agricultural shipment?”, “Who eats
whom in this whole food web?”, and even “How many species are there?” become answerable (Adamowicz, 2015). It is promising
10 DNA Barcoding: Bioinformatics Workflows for Beginners
that capacity for DNA barcoding is growing in the parts of the world where it is needed most, particularly among the younger
generation of researchers who can easily connect with the barcoding analogy (Adamowicz, 2015; Wilson et al., 2016).
Acknowledgements
Kong-Wah Sing is supported by the Chinese Academy of Sciences President's International Fellowship Initiative. We thank the
BOLD team, especially Megan Milton, for their continuous support of our DNA barcoding workshops in Southeast Asia. We are
grateful to CodonCode Corporation for supplying teaching licenses during our workshops. We thank previous sponsors and hosts
of our DNA barcoding workshops: the Centre of Excellence in Fungal Research, the Department of Microbiology and Parasitology,
and the Faculty of Medical Science at Naresuan University, Phitsanulok, Thailand; the Zoological and Ecological Research Net-
work, and Museum of Zoology at the University of Malaya, Kuala Lumpur, Malaysia; the University of Nottingham Malaysia
Campus, Selangor, Malaysia; Tunku Abdul Rahman University College, Kuala Lumpur, Malaysia; and the Asia-Pacific Network for
Global Change Research. We also thank the scientists who have helped facilitate our workshops: Paul Hebert, Brandon Mong Guo
Jie, Lee Ping Shin, Evan Chin, Kharunnisa Syaripuddin, Jedsada Sukantamala, Cheah Men How, Siti Azizah M Nor, Noor Adelyna
M Akib, Mr Foo, Mr Chin, Elizabeth Clare.
Supplementary data associated with this article can be found in the online version at 10.1016/B978-0-12-809633-8.20468-8.
References
Adamowicz, S.J., 2015. International barcode of life: Evolution of a global research community. Genome 58, 151–162.
Brandon-Mong, G.J., Gan, H.M., Sing, K.W., et al., 2015. DNA metabarcoding of insects and allies: An evaluation of primers and pipelines. Bulletin of Entomological Research
105, 717–727.
Casas, P.A.S., Sing, K.W., Lee, P.S., et al., 2017. DNA barcodes for dragonflies and damselflies (Odonata) of Mindanao, Philippines. Mitochondrial DNA Part A, Online Early.
doi:10.1080/24701394.2016.1267157.
Hebert, P.D.N., Braukmann, T.W.A., Prosser, S.W.J., et al., 2017. A sequel to sanger: Amplicon sequencing that scales. bioRxiv. 191619. Available at: https://doi.org/10.1101/
191619.
Jisming- See, S.W., Sing, K.W., Wilson, J.J., 2016. DNA barcodes and citizen science provoke a diversity reappraisal for the “ring” butterflies of Peninsular Malaysia (Ypthima:
satyrinae: Nymphalidae: lepidoptera). Genome 59, 879–888.
Liu, S., Yang, C., Zhou, C., Zhou, X., 2017. Filling reference gaps via assembling DNA barcodes using high-throughput sequencing-moving toward barcoding the world.
Gigascience 6, 1–8.
Polseela, R., Jaturas, N., Thanwisai, A., Sing, K.W., Wilson, J.J., 2016. Towards monitoring the sandflies (Diptera: psychodidae) of Thailand: DNA barcoding the sandflies of
Wihan Cave, Uttaradit. Mitochondrial DNA Part A 27, 3795–3801.
Pons, J., Barraclough, T.G., Gomez-Zurita, J., et al., 2006. Sequence-based species delimitation for the DNA taxonomy of undescribed insects. Systematic Biology 55, 595–609.
Puillandre, N., Lambert, A., Brouillet, S., Achaz, G., 2012. ABGD, Automatic barcode gap discovery for primary species delimitation. Molecular Ecology 21, 1864–1877.
Ratnasingham, S., Hebert, P.D.N., 2013. A DNA-based registry for all animal species: The Barcode Index Number (BIN) system. PLOS ONE 8, e66213.
Schmieder, R., Edwards, R., 2011. Quality control and preprocessing of metagenomic datasets. Bioinformatics 27, 863–864.
Sing, K.W., Syaripuddin, K., Wilson, J.J., 2016. How to rapidly accelerate biodiversity inventory in places where most of the species are unknown? Malayan Nature Journal 68,
131–134.
Sing, K.W., Wilson, J.J., 2017. Butterfly diversity at a recreation hotspot in Setiu Wetlands, Terengganu, Malaysia. Prosiding Seminar Ekspedisi Saintifik Tanah Bencah Setiu
2016. Selangor. WWF-Malaysia. pp. 86–96.
Sukantamala, J., Sing, K.W., Jaturas, N., Polseela, R., Wilson, J.J., 2017. Unexpected diversity of sandflies (Diptera: psychodidae) in tourist caves in Northern Thailand.
Mitochondrial DNA Part A 28, 949–955.
Syaripuddin, K., Kumar, A., Sing, K.W., et al., 2014. Mercury accumulation in bats near hydroelectric reservoirs in Peninsular Malaysia. Ecotoxicology 23, 1164–1171.
Tamura, K., Stecher, G., Peterson, D., Filipski, A., Kumar, S., 2013. MEGA6: Molecular evolutionary genetics analysis Version 6.0. Molecular Biology and Evolution 30, 2725–2729.
Wilson, J.J., 2012. DNA barcodes for insects. In: Kress, W.J., Erikson, D.L. (Eds.), DNA Barcodes: Methods and Protocols. New York: Humana Press.
Wilson, J.J., Rougerie, R., Shonfeld, J., et al., 2011. When species matches are unavailable are DNA barcodes correctly assigned to higher taxa? An assessment using sphingid
moths. BMC Ecology 11, 18.
Wilson, J.J., Sing, K.W., 2013. DNA barcoding can successfully identify Penaeus monodon, associate life cycle stages, and generate hypotheses of unrecognised diversity.
Sains Malaysiana 42, 1827–1829.
Wilson, J.J., Sing, K.W., Floyd, R.M., Hebert, P.D.N., 2017. DNA barcodes and insect biodiversity. In: Foottit, R.G., Adler, P.H. (Eds.), Insect Biodiversity: Science and Society,
second ed. Oxford: Blackwell Publishing Ltd, pp. 575–592.
Wilson, J.J., Sing, K.W., Lee, P.S., Wee, A.K.S., 2016. Application of DNA barcodes in wildlife conservation in Tropical East Asia. Conservation Biology 30, 982–989.
Wilson, J.J., Sing, K.W., Sofian-Azirun, M., 2013. Building a DNA barcode reference library for the true butterflies (Lepidoptera) of Peninsula Malaysia: What about the
subspecies? PLOS ONE 8, e79969.
Zhang, J., Kapli, P., Pavlidis, P., Stamatakis, A., 2013. A general species delimitation method with applications to phylogenetic placements. Bioinformatics 29, 2869–2876.
Further Reading
Adamowicz, S.J., Chain, F.J.J., Clare, E.L., et al., 2016. From barcodes to biomes: Special issues from the 6th international barcode of life conference. Genome 59, v–ix.
Kress, W.J., Erikson, D.L. (Eds.), 2012. DNA Barcodes: Methods and Protocols. New York: Humana Press.
DNA Barcoding: Bioinformatics Workflows for Beginners 11
Relevant Websites
http://wwwabi.snv.jussieu.fr/public/abgd/
ABGD.
www.boldystems.org
Barcode of Life Datasystems (BOLD).
www.mbio.ncsu.edu/bioedit/bioedit.html
BIOEDIT.
http://species.h-its.org/ptp/
bPTP.
www.codoncode.com
CODONCODE ALIGNER.
www.barcodingasia.weebly.com
DNA Barcoding Workshops (DBW).
www.usegalaxy.org
GALAXY.
https://blast.ncbi.nlm.nih.gov/Blast.cgi
GenBank BLAST.
http://www.megasoftware.net/
Molecular Evolutionary Genetics Analysis (MEGA).
www.prinseq.sourceforge.net
PRINSEQ.