You are on page 1of 14

120:202 Foundations of Biology: Cell and Molecular Biology Laboratory/Fall 2015

Lab N 1 Bioinformatics: Self-Guided Internet-based Exercise


on Databases for the Storage and Data Mining
STUDENT NAME (LAST, FIRST): Shehata, Rola SECTION N _07__
INSTRUCTOR: Ms. Adams

Mr. Antikainen

Ms. Kim

Mr. Niknam

Mr. Nnah

Purpose
This exercise aims to introduce you to some of the relevant databases and bioinformatics tools for
examining and comparing different pieces of biological information. Biological databases are an
important resource (Maloney et al., 2010) for the study of biochemistry, molecular genetics, transmission
genetics, cell biology, evolution and many other branches of biological sciences.
Introduction
Biological databases contain enormous amounts of information about the sequences and structures of
nucleic acids (DNA and RNA) and proteins; gene structures and chromosomes; metabolic pathways and
enzymes; signaling mechanisms, etc. Some of them include software tools that can be used to analyze
such data. Often, the software can be used directly through a web browser (web apps). Freestanding
applications must be downloaded and installed on your computer or a local network.
Notes
The Internet hyperlinks are active in this Word document, which is the one you should use to work on. Do
not use the PDF contained in the manual. Enter your answers by double-clicking the phrase
STARTTTYPINGTHERE and start typing. When you complete the exercise, provide your instructor with a
hard copy, or submit via SafeAssign, or send by e-mail, as indicated.
Important: Always give your document a title that includes your name and other pertinent information.
Untitled 1.docx is not a good name, neither Graph.xlsx or ExtraCredit.pdf. You can imagine how
many papers we get from students curiously named Untitled 1. So, heres a suggestion (assuming that
you use Microsoft Word):
LAST-NAME_FIRST-NAME_202_SECTION_NN_Bioinformatics.docx. Example, Ms. Rachel Baker
sends a paper to Mr. Hank Chengs section 3. So Ms. Baker gives her paper the unmistakable name
Baker_Rachel_202_03_Bioinformatics.docx.
I. Finding Databases in the World Wide Web
We'll start by finding databases (Honts, 2003). You may click the URLs in this document. Describe in a
short sentence, what is the function of each particular website. The home page usually has a brief
description of what the website aims to accomplish. Some titles are revealing (e.g. OMIM = Online
Mendelian Inheritance in Man); others are not. For example, if you read the top of BLASTs first page,
youll find: BLAST finds regions of similarity between biological sequences. Click more to find a
further description.

120:202 Foundations of Biology CMB Laboratory

Summer Session I, 2015

Bioinformatics Working File, p. 2

General databases and tools for bioinformatics studies


National Center for Biotechnology Information
http://www.ncbi.nlm.nih.gov/
Brief description: NCBI helps with science & health research by providing the public access to
information on.
BLAST
http://blast.ncbi.nlm.nih.gov/Blast.cgi
Brief description: It is a search tool (like Google) that helps find similarities between biological
sequences in certain genomes.

PubMed
http://www.ncbi.nlm.nih.gov/pubmed
Brief description: This is a search engine for scientific medical research articles that are published
in journals.
Online Mendelian Inheritance in Man (OMIM)
http://www.ncbi.nlm.nih.gov/omim
Brief description: This is a free online database that is comprised of information on human genes
and genetic phenotypes.
NCBI Conserved Domain Search
http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi
Brief description: This is a search engine that is used for comparing a specific nucleotide or protein
sequence against other sequences.
CDART: Conserved Domain Architecture Retrieval Tool
http://www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi?cmd=rps
Brief description: CDART compares protein sequences with other protein sequences for
similarities.

Multiple sequence alignments


http://www.ebi.ac.uk/Tools/msa/
Brief description: This compares three or more biological sequences of similar lengths, where
homology is the main outcome that can be inferred.

Protein Data Bank


http://www.rcsb.org/pdb/
Brief description: This is mainly used to research the 3D shape of proteins and nucleic acids.

Kyoto Encyclopedia of Genes and Genomes (KEGG)


http://www.genome.jp/

120:202 Foundations of Biology CMB Laboratory

Summer Session I, 2015

Bioinformatics Working File, p. 3

Brief description: This is a database that uses bioinformatics to apply genomes to the outside
environment.
Access points for integrated suites of sequence analysis tools
ExPASy
http://www.expasy.ch/
Brief description: This website provides information and access to databases and software in
different scientific areas/fields.

European Bioinformatics Institute


http://www.ebi.ac.uk/index.html
Brief description: EMBL-EBI is purely research-based and offers information from previous
experiments, and training programs.
PRABI (Rhone-Alpes Bioinformatics Center)
http://www.prabi.fr/
Brief description: PRABI gathers research teams for future research, in bioinformatics and
biostatistics fields.
Biology Workbench
http://workbench.sdsc.edu/
Brief description: The Biology WorkBench is a web-based source for biologists to search common
protein and nucleic acid sequence databases.
Some resources for human genomics
The Human Genome (NCBI)
http://www.ncbi.nlm.nih.gov/genome/guide/human/
Brief description: The Human Genome of NCBI provides information on human genomes for
biomedical research.
Human Genome Browser Gateway (UCSC)
http://genome.ucsc.edu/cgi-bin/hgGateway?db=hg10
Brief description: UCSC is for reference sequences of humans specifically for genomes.

120:202 Foundations of Biology CMB Laboratory

Summer Session I, 2015

Bioinformatics Working File, p. 4

ENCODE
http://genome.ucsc.edu/ENCODE/
Brief description: Encode is a database that is mainly about human genomes and is accessible for
research purposes.
Databases with entire genomic sequences
National Center for Genome Resources
http://www.ncgr.org/
Brief description: NCGR is a nonprofit research institute that combines research about
bioinformatics and sequencing.

NCBI Human Genome Resources:


http://www.ncbi.nlm.nih.gov/genome/guide/human/
Brief description: This is a website that is used to couple data on human genome projects.
The J. Craig Venter Institute (formerly, The Institute for Genomic Research or TIGR):
http://www.jcvi.org/
Brief description: This institute was designed to merge several organizations that focused on
genome research.
Gramene: A Resource for Comparative Grass Genomics
http://www.gramene.org/
Brief Description: This compiles the genomes of certain plant species, and includes certain crops.
Example of a specialized structure prediction tool
COILS Server
http://www.ch.embnet.org/software/COILS_form.html
Brief description: This server compares sequences and databases of known parallel coils and looks
for similarities.

Metabolic and signaling pathways


Escherichia coli
http://www.ecocyc.org/
Brief description: This is a database that is solely about the research done on the bacterium
Escherichia coli K-12 MG1655.
Arabidopsis thaliana
http://www.arabidopsis.org/biocyc/
Brief description: This is for looking up pathways, compounds, enzymes that are related to the
metabolic processes in plants.
Homo sapiens
http://www.hmdb.ca

120:202 Foundations of Biology CMB Laboratory

Summer Session I, 2015

Bioinformatics Working File, p. 5

Brief description: This is a freely available electronic database that contains detailed information
about small molecule metabolites found in the human body.
Additional learning resources
Taxonomy:
http://www.hyperdictionary.com/dictionary/taxonomy
Brief description: The definition of taxonomy according to this website is the study of how things
are classified.
Gene ontology:
http://www.yeastgenome.org/help/glossary.html
Brief description: This is a glossary for the terms used in the SGD database.
Phylogenetic trees:
http://encyclopedia.thefreedictionary.com/phylogenetic tree
Brief description: This is a website that describes phylogenetic trees as documentation of the
evolution of a certain species.
http://aleph0.clarku.edu/~djoyce/java/Phyltree/cover.html
Brief description: This is a website that allows one to practice building phylogenetic trees.
http://www.phylogenetictrees.com/segminator.php
Brief description: This is about John Archers research in molecular biology and bioinformatics.
Multiple sequence alignment (protein)
http://www.ncbi.nlm.nih.gov/tools/cobalt/cobalt.cgi?link_loc=BlastHomeLink
Brief description: Part of Blast, COBALT compares an inputted protein sequence with the ones
stored in its database.
Google Scholar
http://scholar.google.com/schhp?hl=en&tab=ws
Brief description: A search engine for peer-edited journals and published articles.
WolframAlpha
http://www.wolframalpha.com/
Brief description: Another search engine, but is used for more commonly asked questions.
Using your lecture textbook and online resources above define the terms below. Please do not use
Wikipedia. You may use resources such as Google Scholar (see URL above) which will refer you back
to original sources or WolframAlpha, which was designed in a more scientific manner that any other
web search engine:
a) Protein secondary structure = The -helices and - sheets that form within the
polypeptide chain.

120:202 Foundations of Biology CMB Laboratory

Summer Session I, 2015

Bioinformatics Working File, p. 6

b) Taxonomy= practice of classifying plants and animals according to their presumed natural
relationships, similarities of their structures, or origin.

c) Gene ontology = The common language that describe aspects of a gene product's biology.
d) Phylogenetic tree = A way to show evolutionary relationships among various species based on
their similar traits in their physical and genetic characteristics.
e) Multiple sequence alignment = The comparison of 3 or more biological sequences that are
similar in length
f) Expressed sequence tag = These are fragments of mRNA sequences derived through
single sequencing reactions.
g) Epitope = The part of an antigen molecule to which an antibody attaches itself.
h) Single nucleotide polymorphism = a single base-pair difference in the DNA sequence of
individual members of a species.
i)

Allele = One of a pair of genes usually one recessive and one dominant, that appear at a
particular location on a particular chromosome that control the same characteristic

j)

Chromosome = thread-like structures located inside the nucleus of animal and plant cells that
contains DNA.

k) Amplicon = its a piece of DNA or RNA that is the product of natural or artificial
amplification/replication events.

120:202 Foundations of Biology CMB Laboratory

Summer Session I, 2015

Bioinformatics Working File, p. 7

The National Center for Biotechnology Information (NCBI)


NCBI is a comprehensive network of databases that include information on nucleotidyl sequences
(e.g. chromosomal DNA, mRNA, non-proteincoding RNAs), amino acyl sequences (proteins),
taxonomy, genetically-based diseases (also known as inborn errors of metabolism. Heres a
diagram that illustrates the relationships among these different databases: *

You may want to continue exploring NCBI. This link will take you to a comprehensive list of all
databases in it: http://www.ncbi.nlm.nih.gov/guide/all/#databases_.

II. Case Study


An Unknown Human Nucleotidyl Sequence
Specific Learning Objectives
1. Describe what GenBank files are and be able to read them.
2. Become familiar with the BLAST program (check NCBI website in the resources at the end of this
lab) and be able to use it.
NOTE: Your instructor may decide to assign you sequences that differ from the ones in this section.
If this is the case, enter the modifications to this document as necessary.
Part 1
The nucleotidyl-residue (or nucleotide, for short) sequence on the following page was obtained from a
human DNA sequencing project. You are given the task of identifying the location of this sequence within
the human genome (Alaie et al., 2012). The problem is that the human genome is made up of 3 billion
base pairs (bp). To check even 1000 bp by eye in search of this sequence is quite time-consuming (as you
will find out shortly). Imagine if you had to check a billion nucleotides for this sequence!
Start by scanning (by eye) this 3360-bp sequence in search of the location of the following short nucleotide
stretches. Devise your own method.
i) TATACTTCAGGAACTAATTCTGAAGCATCA and ii) TCTGTGCCTTTTTTATATCTTGGCAGGTAG
Mark the sequences on your printout of this document (underline or use a highlighter) or on the

**

Figure source: http://www.muhlenberg.edu/main/academics/biology/courses/bio152/bioinformatics_lab/index.html

120:202 Foundations of Biology CMB Laboratory

Summer Session I, 2015

Bioinformatics Working File, p. 8

electronic document, as requested by your instructor.


Please note the time at the beginning of your search and answer the following questions once you have
located your sequence.
1. Describe the method you used to find the sequence stretches (visual comparison? computer-aided?).
I used the ctrl+ F find function on my computer, so it was computer-aided.
2. How long did it take for you to find your sequence?
Sequence i)Around a minute
Sequence ii) Around a minute

120:202 Foundations of Biology CMB Laboratory

Summer Session I, 2015

Bioinformatics Working File, p. 9

ACGGCGAGCGCGGGCGGCGGCGGTGACGGAGGCGCCGCTGCCAGGGGGCGTGCGGCAGCGCGG
CGGCGGCGGCGGCGGCGGCGGCGGCGGAGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCTGG
GCCTCGAGCGCCCGCAGCCCACCTCTCGGGGGCGGGCTCCCGGCGCTAGCAGGGCTGAAGAGAAG
ATGGAGGAGCTGGTGGTGGAAGTGCGGGGCTCCAATGGCGCTTTCTACAAGGCATTTGTAAAGGAT
GTTCATGAAGATTCAATAACAGTTGCATTTGAAAACAACTGGCAGCCTGATAGGCAGATTCCATTTCAT
GATGTCAGATTCCCACCTCCTGTAGGTTATAATAAAGATATAAATGAAAGTGATGAAGTTGAGGTGTATT
CCAGAGCAAATGAAAAAGAGCCTTGCTGTTGGTGGTTAGCTAAAGTGAGGATGATAAAGGGTGAGTT
TTATGTGATAGAATATGCAGCATGTGATGCAACTTACAATGAAATTGTCACAATTGAACGTCTAAGATCT
GTTAATCCCAACAAACCTGCCACAAAAGATACTTTCCATAAGATCAAGCTGGATGTGCCAGAAGACTT
ACGGCAAATGTGTGCCAAAGAGGCGGCACATAAGGATTTTAAAAAGGCAGTTGGTGCCTTTTCTGTA
ACTTATGATCCAGAAAATTATCAGCTTGTCATTTTGTCCATCAATGAAGTCACCTCAAAGCGAGCACATA
TGCTGATTGACATGCACTTTCGGAGTCTGCGCACTAAGTTGTCTCTGATAATGAGAAATGAAGAAGCT
AGTAAGCAGCTGGAGAGTTCAAGGCAGCTTGCCTCGAGATTTCATGAACAGTTTATCGTAAGAGAAGA
TCTGATGGGTCTAGCTATTGGTACTCATGGTGCTAATATTCAGCAAGCTAGAAAAGTACCTGGGGTCAC
TGCTATTGATCTAGATGAAGATACCTGCACATTTCATATTTATGGAGAGGATCAGGATGCAGTGAAAAA
AGCTAGAAGCTTTCTCGAATTTGCTGAAGATGTAATACAAGTTCCAAGGAACTTAGTAGGCAAAGTAAT
AGGAAAAAATGGAAAGCTGATTCAGGAGATTGTGGACAAGTCAGGAGTTGTGAGGGTGAGGATTGA
GGCTGAAAATGAGAAAAATGTTCCACAAGAAGAGGAAATTATGCCACCAAATTCCCTTCCTTCCAATA
ATTCAAGGGTTGGACCTAATGCCCCAGAAGAAAAAAAACATTTAGATATAAAGGAAAACAGCACCCAT
TTTTCTCAACCTAACAGTACAAAAGTCCAGAGGGTGTTAGTGGCTTCATCAGTTGTAGCAGGGGAATC
CCAGAAACCTGAACTCAAGGCTTGGCAGGGTATGGTACCATTTGTTTTTGTGGGAACAAAGGACAGC
ATCGCTAATGCCACTGTTCTTTTGGATTATCACCTGAACTATTTAAAGGAAGTAGACCAGTTGCGTTTG
GAGAGATTACAAATTGATGAGCAGTTGCGACAGATTGGAGCTAGTTCTAGACCACCACCAAATCGTAC
AGATAAGGAAAAAAGCTATGTGACTGATGATGGTCAAGGAATGGGTCGAGGTAGTAGACCTTACAGA
AATAGGGGGCACGGCAGACGCGGTCCTGGATATACTTCAGGAACTAATTCTGAAGCATCAAATGCTTC
TGAAACAGAATCTGACCACAGAGACGAACTCAGTGATTGGTCATTAGCTCCAACAGAGGAAGAGAGG
GAGAGCTTCCTGCGCAGAGGAGACGGACGGCGGCGTGGAGGGGGAGGAAGAGGACAAGGAGGAA
GAGGACGTGGAGGAGGCTTCAAAGGAAACGACGATCACTCCCGAACAGATAATCGTCCACGTAATCC
AAGAGAGGCTAAAGGAAGAACAACAGATGGATCCCTTCAGATCAGAGTTGACTGCAATAATGAAAGG
AGTGTCCACACTAAAACATTACAGAATACCTCCAGTGAAGGTAGTCGGCTGCGCACGGGTAAAGATCG
TAACCAGAAGAAAGAGAAGCCAGACAGCGTGGATGGTCAGCAACCACTCGTGAATGGAGTACCCTA
AACTGCATAATTCTGAAGTTATATTTCCTATACCATTTCCGTAATTCTTATTCCATATTAGAAAACTTTGTT
AGGCCAAAGACAAATAGTAGGCAAGATGGCACAGGGCATGAAATGAACACAAATTATGCTAAGAATT
TTTTATTTTTTGGTATTGGCCATAAGCAACAATTTTCAGATTTGCACAAAAAGATACCTTAAAATTTGAA
ACATTGCTTTTAAAACTACTTAGCACTTCAGGGCAGATTTTAGTTTTATTTTCTAAAGTACTGAGCAGTG
ATATTCTTTGTTAATTTGGACCATTTTCCTGCATTGGGTGATCATTCACCAGTACATTCTCAGTTTTTCTTA
ATATATAGCATTTATGGTAATCATATTAGACTTCTGTTTTCAATCTCGTATAGAAGTCTTCATGAAATGCTA
TGTCATTTCATGTCCTGTGTCAGTTTATGTTTTGGTCCACTTTTCCAGTATTTTAGTGGACCCTGAAATGT
GTGTGATGTGACATTTGTCATTTTCATTAGCAAAAAAAGTTGTATGATCTGTGCCTTTTTTATATCTTGGC
AGGTAGGAATATTATATTTGGATGCAGAGTTCAGGGAAGATAAGTTGGAAACACTAAATGTTAAAGATG
TAGCAAACCCTGTCAAACATTAGTACTTTATAGAAGAATGCATGCTTTCCATATTTTTTTCCTTACATAAA
CATCAGGTTAGGCAGTATAAAGAATAGGACTTGTTTTTGTTTTTGTTTTGTTGCACTGAAGTTTGATAAA
TAGTGTTATTGAGAGAGATGTGTAATTTTTCTGTATAGACAGGAGAAGAAAGAACTATCTTCATCTGAGA
GAGGCTAAAATGTTTTCAGCTAGGAACAAATCTTCCTGGTCGAAAGTTAGTAGGATATGCCTGCTCTTT
GGCCTGATGACCAATTTTAACTTAGAGCTTTTTTTTTTTAATTTTGTCTGCCCCAAGTTTTGTGAAATTTT
TCATATTTTAATTTCAAGCTTATTTTGGAGAGATAGGAAGGTCATTTCCATGTATGCATAATAATCCTGCA
AAGTACAGGTACTTTGTCTAAGAAACATTGGAAGCAGGTTAAATGTTTTGTAAACTTTGAAATATATGGT
CTAATGTTTAAGCAGAATTGGAAAAGACTAAGATCGGTTAACAAATAACAACTTTTTTTTCTTTTTTTCT
TTTGTTTTTTGAAGTGTTGGGGTTTGGTTTTGTTTTTTGAGTCTTTTTTTTTTAAGTGAAATTTATTGAGG
AAAAATA

120:202 Foundations of Biology CMB Laboratory

Summer Session I, 2015

Bioinformatics Working File, p. 10

Part 2
Let us explore the efficiency of using vast online databases and online search tools to locate and identify
unknown nucleotide sequences. One such search tool is called BLAST (Basic Local Alignment Search
Tool). This program compares a nucleotidyl (DNA, RNA) or amino acyl sequence (protein) of interest to
online databases looking for regions of local similarity and calculates the statistical significance of
matches. One such online database is NCBIs GenBank, which contains the sequences of at least three fulllength human genomes and, being hosted by the National Library of Medicine (a brand of the National
Institutes of Health), is free to the public.
Finding sequences of known (or putative) function in a database that have similarity to your sequence of
interest may allow you to identify the gene family to which your sequence belongs or the functional
significance of your sequence, if any. You will use a BLAST search to uncover information about an
unknown sequence. Copy and paste the unknown sequence (either the one from last page or as provided
by your sections instructor) onto a new Word document and save it in your computers hard drive. Give it
the title, 202_Test_Sequence_LAST_FIRST (example: 202_Test_Sequence_YADAV_MAHESH.docx).
1. Go to NCBI BLAST website at http://www.ncbi.nlm.nih.gov/BLAST/
2. Click the link (on the rightmost column) BLAST.
3. In the resulting page (NCBI/BLAST Home) at http://blast.ncbi.nlm.nih.gov/Blast.cgi) click on
the link nucleotide blast. Copy the first line of the nucleotide sequence in the Word document and
paste it in the Enter Query Sequence box. (The top line, preceded by the > sign, is the
description of what the sequence is.)
4. Leave the settings as they are, but make sure that Human genomic + transcript is selected in the
Search Set options. Scroll to the bottom of the page and click the BLAST button in the left-hand
corner. Wait for results. Did your sequence find any matches in the human genome database?
5. No, there was no significant similarity found.
What could be the reason for this result?
STARTTTYPINGGHERE
6. Now try a longer sequence. Copy the first three lines and paste this sequence into the Enter
Query Sequence box and click BLAST again. Did your query match any sequence in the human
genome database?
STARTTTYPINGGHERE
If so, what match did it locate?
STARTTTYPINGGHERE
7. Next copy one line that is roughly in the middle of the provided sequence and paste it into the
Query Sequence box and run the BLAST search again. Did you get a result this time?
STARTTTYPINGGHERE
8. Propose a reason for why this one line yielded a different result than the one line at the beginning

120:202 Foundations of Biology CMB Laboratory

Summer Session I, 2015

Bioinformatics Working File, p. 11

of the sequence.
STARTTTYPINGGHERE
9. Click on the first of the matches that your search yielded. This match should be with a sequence
within GenBank. What is the name of this gene? What is the Sequence ID?
STARTTTYPINGGHERE
10. What chromosome is this located in? At what location of this gene?
XM_012665304 at 2152 bp

120:202 Foundations of Biology CMB Laboratory

Summer Session I, 2015

Bioinformatics Working File, p. 12

Part 3
A mature fully processed messenger RNA (mRNA) contains nucleotide triplets in a particular sequence
that are read from an initiation codon (AUG) up to one or two termination codons (out of three: UAG,
UAA, UGA). The expression of a eukaryotic gene is controlled by DNA sequences called regulatory
regions. The regulatory regions include the genes promoter, which binds RNA polymerase once the
transcription factors have bound the DNA and made that site accessible, and one or more enhancers that
also bind transcription factors and contribute to the control of gene expression.
Usually, the expression of a gene can be modified if one of its regulatory regions undergos a mutation.
This mutation may be of immense significance, even if the change involves a single base substitution,
since a transcription factors recognition of the site is sequence-specific. Mutations may involve more
substantial changes to the genes regulatory regions, such as multiple nucleotide deletions, or, as in the
case of the gene under study in this lab, multiple nucleotide additions which may eventually result in the
silencing of this gene.
The gene you searched codes for the so-called fragile-X mental retardation protein (FMRP). The promoter
of this gene contains a variable number of the trinucleotide repeat CGG. Individuals with no disease
(normal phenotype or wildtype) have promoters containing <60 CGG repeats. Individuals whose
promoters contain 60200 trinucleotide repeats are said to possess a premutation that renders them
susceptible to movement problems (ataxia) later in life. Individuals whose promoters have >200 CGG
trinucleotide repeats are afflicted with fragile-X syndrome and display a wide range of symptoms that
include mental retardation, large testes, etc. In turn, FMRP is involved in the transport of RNA transcripts
to polyribosomes located at sites of protein synthesis. In neurons these sites include the terminals of
axons. Loss of expression of FMRP has far-reaching consequences for an affected individual.
Questionnaire
1. Look at the sequence you searched using the BLAST program.
Would you predict that this gene comes from a normal person, a person with a premutation, or a
person afflicted with fragile-X syndrome?
STARTTTYPINGGHERE
Explain your reasoning.
STARTTTYPINGGHERE
2. We used the default database when conducting our BLAST search. This database contains only
human genome sequences. Imagine that the sequence you subjected to the BLAST search yielded
no matches (regardless of the length of the sequence you entered into the Query box).
What would you infer about that sequence?
STARTTTYPINGGHERE
3. What result would you predict if we searched that sequence against all known sequences?
STARTTTYPINGGHERE
A database containing all known nucleotide sequences exists and is called nucleotide collection
(nr/nt). This database can be found on the BLAST site under Choose Search Set. At Database
you will see that the Human Genome + transcript is selected. Select Others instead and you

120:202 Foundations of Biology CMB Laboratory

Summer Session I, 2015

Bioinformatics Working File, p. 13

will find that the nucleotide collection (nr/nt) database is automatically selected. Run your
search against this vast database.
4. How do your results differ from the original search?
STARTTTYPINGGHERE
5. Describe the capabilities of a BLAST search.
STARTTTYPINGGHERE

120:202 Foundations of Biology CMB Laboratory

Summer Session I, 2015

Bioinformatics Working File, p. 14

6. What could be the possible limitations of a BLAST search?


STARTTTYPINGGHERE
7.

BLAST is often nicknamed the Google of DNA search tools. Compare a BLAST search to a
Google search and list one possible similarity and one possible difference.
STARTTTYPINGGHERE

Discussion
You are given a sequence of DNA and told that it is human. You are asked to find out its identity and
whether it has similarity to sequences in other organisms. Please describe the bioinformatics tool, the
database, and the procedure you would use to find such information. Give two possible outcomes of your
search.
STARTTTYPINGGHERE

Bibliography
Alaie A, Teller V, Qiu W-g (2012) A bioinformatics module for use in an introductory biology laboratory.
Am Biol Teach 74:318-332.
Honts JE (2003) Evolving strategies for the incorporation of bioinformatics within the undergraduate cell
biology curriculum. CBE Life Sci Educ 2:233-247.
Maloney M, Parker J, LeBlanc M, Woodard CT, Glackin M, Hanrahan M (2010) Bioinformatics and the
undergraduate curriculum. CBE Life Sci Educ 9:172-174.

You might also like