You are on page 1of 8

AP Biology Comparing DNA Sequences to Understand Evolutionary Relationships With BLAST Name_______________________________ Date_______________________ Block_____ Background.

Between 19902003, scientists working on an international research project known as the Human Genome Project were able to identify and map the approximately 20,000 genes that define a human being. The project also successfully mapped the genomes of other species, including the fruit fly, mouse, and Escherichia coli. The location and complete sequence of the genes in each of numerous species are now available for anyone in the world to access via the Internet. Why is this information important? Being able to identify the precise location and sequence of human genes will allow us to better understand aspects of human development and genetic diseases. In addition, learning about the sequence of genes in other species helps us understand evolutionary relationships among organisms. Many of our genes are identical or similar to those found in other species. Suppose you identify a single gene that is responsible for a particular disease in fruit flies. Is that same gene found in humans? Does it cause a similar disease? It would take you nearly 10 years to read through the entire human genome to try to locate the same sequence of bases as that in fruit flies. This definitely isnt practical, so a sophisticated technological method is needed. Bioinformatics is a field that combines statistics, mathematical modeling, and computer science to analyze biological data. Using bioinformatics methods, entire genomes can be quickly compared in order to detect genetic similarities and differences. An extremely powerful bioinformatics tool is BLAST, which stands for Basic Local Alignment Search Tool. Using BLAST, you can input a gene sequence of interest and search entire genomic libraries for identical or similar sequences in a matter of seconds. In this laboratory investigation, you will use BLAST to compare several genes, and then transfer the information into Biology Workbench to construct a cladogram. A cladogram (also called a phylogenetic tree) is a visualization of the evolutionary relatedness of species. Figure 1 is a simple cladogram. Note that the cladogram is treelike, with the endopoints of each branch representing a specific species. The closer two species are located to each other, the more recently they share a common ancestor. For example, Selaginella (spikemoss) and Isoetes (quillwort) share a more recent common ancestor. Figure 2 (next page) includes additional details, such as the evolution of particular physical structures called shared derived characteris. Note that the placement of derived characters indicates that every species above the character label possesses that structure. For example, tigers and gorillas have hair, but lampreys, sharks, salamanders, and lizards do not have hair. The Figure 2 cladogram can be used to answer several questions. Which organisms have lungs? What three structures do all lizards posses? Did dry skin or hair evolve first?

Historically, only physical structures were used to create cladograms; however, current cladistics relies heavily on genetic evidence as well Chimpanzees and humans share about 98% of their DNA, which would place them closely together on a cladogram. Humans and fruit flies share approximately 60% of their DNA, which would place them farther apart on a cladogram. Can you draw a cladogram that depicts the evolutionary relationship among humans, chimpanzees, fruit flies, and mosses? Pre-lab Assignment. Complete 2 and 3 in your lab notebook. 1. Work through http://www.ucmp.berkeley.edu/education/explorations/reslab/flight/main.htm to learn more about the evolution of flight and the development of phylogenetic trees using cladistics. 2. Use the data at right to construct a cladogram of the major plant groups. 3. GAPDH (glyceraldehyde 3-phosphate dehydrogenase) is an enzyme that catalyzes the sixth step in glycolysis, an important reaction that produces molecules used in cellular respiration. The following data table shows the percentage similarity of this gene and the protein it expresses in humans versus other species. For example, according to the table, the GAPDH gene in chimpanzees is 99.6% identical to the gene found in humans, while the protein is identical. a. Why is the percentage similarity in the gene always lower than the percentage similarity in the protein for each of the species? (Hint: Recall how a gene is expressed to produce a protein.) b. Draw a cladogram depicting the evolutionary relationships among all five species (including humans) according to their percentage similarity in the GAPDH gene.

Part 1. Molecular Phylogeny and Marine Mammals


Adapted from: Maier, Caroline Alexandra. Building Phylogenetic Trees from DNA Sequence Data: Investigating Polar Bear and Giant Panda Ancestry. The American Biology Teacher; Vol. 63, No. 9, pp. 643-646 and Foglia, Kim and Stuart M. Brown. Walruses, Whales and Hippos, Oh My: Using Bioinformatics to Teach Cladistics and Evolution. <http://www.med.nyu.edu/rcr/rcr/course/NABT-whales.pdf> accessed 24 April 2011.

Walruses, whales, dolphins, seals, and manatees are all marine mammals. They all have streamlined bodies, legs reduced to flippers, blubber under the skin and other adaptations for survival in the water. Although mammals evolved on land, these species have returned to the sea. Did they evolve from a single ancestor who returned to the ocean, or were there different return events and parallel (convergent) evolution that led to similar adaptations among these species? It is not possible to go back in time to observe what happened, but DNA and protein sequences contain evidence about the evolutionary history of organisms and the relationships between living creatures. Once we collect and analyze DNA or protein sequences of marine and land mammals, it is possible that the data will reveal the evolutionary history of marine mammals. This analysis uses a protein that all mammals share, the hemoglobin beta protein. Hemoglobin is a protein made of four polypeptide subunits, two alpha chains and two beta chains. Hemoglobin is a good molecule for this evolutionary analysis because it shows conservation across species, since it performs the essential function of carrying oxygen in the blood, and variation between species, due to random DNA mutations accumulating over time. In addition, many biologists have studied hemoglobin, so sequences from many different organisms are available in the GenBank database, a public database of gene and protein sequences, available at the National Center for Biotechnology Information (NCBI) website. After obtaining the hemoglobin beta amino acid sequences of several mammalian species from GenBank, you will load that information into analysis software called Biology Workbench. Using the tools available in Biology Workbench, you will compare the amino sequences and create phylogenetic trees to determine the evolutionary relationships between these species. The goal of this analysis is to test hypotheses about the evolutionary ancestry of different marine mammals: Did marine mammals evolve from a single ancestor that returned to the ocean, or were there distinct return events from separate ancestors? A useful starting hypothesis is that all modern marine mammals have a single common land mammal ancestor.
Table 1: Accession Numbers of Species (to be used with NCBI site) Species Scientific Name Hemoglobin beta (Hbb) Accession Number Abyssian Hyrax Procavia capensis habessinica P02086 African Elephant Loxodonta Africana P02085 Amazon Manatee Trichechus manatus P07415 Bottlenose Dolphin Tursiops truncatus P18990 Domestic cow Bos taurus P02070 Domestic dog Canis lupus familiaris P60524 Harbor Seal Phoca vitulina P09909 Hippopotamus Hippopotamus amphibious P19016 Human Homo sapiens P68871 Minke Whale Balaenoptera acutorostrata P18984 Mouse Mus musculus BAG16710.1 Pacific Walrus Odobenus rosmarus divergens P68046 Red Kangaroo Macropus rufus P02107 Rhesus Monkey Macaca mulatta AFE67078 Sperm Whale Physeter catodon P09905.1

Procedure Part 1 - Accessing hemoglobin beta (Hbb) amino acid sequences and importing them into Biology Workbench 1. Using Firefox, go to www.ncbi.nlm.nih.gov 2. In the Search pull-down menu at the top, select Protein, and in the space below enter the accession number (located in Table 1 above) for the amino acid sequence of the Amazon Manatee in the space. Select Go. 3. Make sure the display format is GenPept to view the information about the sequence. This page shows you the classification of the organism (Domain, Kingdom, Phylum, etc.), as well as relevant journal articles. Double-check with the scientific name listed in Table 1 to make sure you have the hemoglobin sequence for the correct organism. 4. At the top of the screen, beside Display, select FASTA. This will show the amino acid sequence into a format that can be read by the Biology Workbench program. Note that the sequence uses the one-letter abbreviations for the names of the amino acids. 5. To import the sequence into Biology Workbench, highlight the entire FASTA sequence, including the initial > symbol. Copy the highlighted sequence. 6. Open a new window in Firefox, and log on to http://workbench.sdsc.edu. Click on register for a free account. Set up your account. Then enter Biology Workbench 3.2. 7. At bottom of screen, select Protein Tools. On the next page then Add New Protein Sequence and Run. 8. In the label box, type Amazon Manatee. 9. Position the cursor in the top left corner of the sequence box and paste the copied amino acid sequence. Replace the FASTA identification lines (everything after > and before the amino acid sequence) with the descriptive name (Amazon Manatee) used in the label box. To do this, position the cursor after the initial > symbol, highlight the rest of the identification lines, and delete. Type the descriptive name immediately behind the > symbol. Make sure to keep the > and all of the amino acid sequence! Select save at the bottom of the screen. 10. Return to the window open to NCBI, and repeat steps 2-9 for each of the other species.

Part Two - Analyzing the relationships between different species using the Hbb sequence 1. First, align sequences for five species Domestic Cow, Domestic Dog, Harbor Seal, Minke Whale, and Red Kangaroo - and draw a tree showing evolutionary relationships based on the hemoglobin beta amino acid sequences. a. In Biology Workbench, select Protein Tools and select the five species sequences (Domestic Cow, Domestic Dog, Harbor Seal, Minke Whale, and Red Kangaroo) by checking the boxes beside the names at the bottom of the page. b. From the Protein Tools menu box, choose CLUSTALW, followed by run and then submit. (Note: Here and in other steps, the program may run automatically, and you may need only enter submit.) Observe the alignment of sequences, paying attention to the consensus key. Which of the five species appears least related to the other four based on this sequence analysis? c. Click on Import Alignment to enter the Alignment Tools section of Biology Workbench.

2. Determine the Genetic Distance between sequence pairs a. Click on the box to the left of the aligned sequences, then choose CLUSTALDIST, Run and Submit. b. Observe the clustal distance matrix and record in a table in your lab notebook. These numbers indicate the degree of difference between the hemoglobin beta amino acid sequences from each species. A difference of 0.00 indicates identical sequences. As the difference between two sequences increase, their distance number increases. c. Which species is least related to others based analysis of the Hbb amino acid sequence? 3. Build a rooted Phylogenetic Tree of these five species a. Click Return. b. From the Alignment Tools menu, choose DRAWGRAM then Run and Submit. Draw the tree that appears on the screen in the data section. c. What does this tree suggest about whether or not all marine mammals share a single ancestor that returned to the ocean? 4. Determine the relationships of all twelve mammalian species. a. Hit return to bring you back to the main page. b. Repeat step 1 above, but, this time, select all twelve species. c. Repeat steps 2 and 3 to create a Phylogenetic Tree that includes all twelve species. Be sure to select the appropriate CLUSTALW group during step 2.

Data Record Example Make a table similar to the one below in your lab notebook to record Clustal Distance Analysis for Hemoglobin beta sequences of Domestic Cow, Domestic Dog, Harbor Seal, Minke Whale, and Red Kangaroo Record species names in boxes on top and right side of table. Record genetic distance between each pair of species in the appropriate box. (NOTE: Shaded squares would repeat data, so do not need to be filled in.) Table 2. Model of table to be recorded in your lab notebook. Species names

Draw the two phylogenetic trees in your lab notebook. Note that each node indicates the most recent common ancestor of two (or more) species. The lengths of the branches do NOT necessarily indicate relative time since divergence between species. Phylogenetic Tree #1 - Domestic Cow, Domestic Dog, Harbor Seal, Minke Whale, and Red Kangaroo Phylogenetic Tree #2 All fifteen species listed in Table 1. Part 2. Molecular Phylogeny and the Tree of Life Suppose one is interested in examining more branches of the tree of life. Can hemoglobin beta protein be used to study plants? Fungus? Think about the function of this protein and hypothesize as to the types of organisms that would produce this protein. A useful protein to study a range of organisms, beyond vertebrates, beyond animals, even beyond multicellular creatures! is cytochrome C. Look up this protein and find out what it does. How does its function help to explain why it is present in a very broad range of organisms? What other structures and processes are likely to be shared among all eukaryotes? Among both prokaryotes and eukaryotes? Using the same steps you followed above, you will construct a phylogenetic tree using information stored in the NCBI database and the alignment tools of Biology Workbench. The list of organisms you will analyze is found at the top of the next page. Draw the resulting tree in your notebook.

Table 3: Accession Numbers of Species (to be used with NCBI site) Species Scientific Name Cytochrome C Accession Number Albacore Tuna Thunnus alalunga P81459 Bullfrog Rana catesbeuiana ACO51922.1 Chicken Gallus gallus NP_001072946.1 Corn Zea mays AFW81901.1 Domestic cow Bos taurus NP_001039526.1 Domestic dog Canis lupus familiaris AEP27248.1 Fruit Fly Drosophila melanogaster AAA28437.1 Fungus Neurospora crassa AAA92156.1 Fungus Candida albicans AAB68996.1 Horse Equus caballus NP_001157486.1 Human Homo sapiens NP_061820.1 Potato Solanum tuberosum AFX66977.1 Snapping turtle Chelydra serpentina P00022.2 Thale Cress Arabidopsis thaliana AAB72175.1 Turkey Meleagris gallopavo P67882.2 Wheat Triticum aestivum P00068.1

Part 3. Bats arent bugs! Can you help clear up Calvins misunderstanding? Choose five or six organisms and find out their scientific names. You may use some of the organisms listed above, if you wish. Check in NCBI to see if you can find the sequence for the protein you wish to study. (Does it make more sense to analyze hemoglobin sequences? Or cytochrome c? Hmmmm.) Follow the steps you used before to make a small phylogenetic tree that might help Calvin better understand classification of bats. Discussion Questions. Turn in written answers to these questions, along with your notes taken while completing the lab and your phylogenetic trees for all parts of the lab. 1. Why could hemoglobin beta protein be used to answer the question about the origin of marine mammals, but the second tree required the use of the cytochrome c protein? 2. Which gene must have evolved more recently? Why? 3. What have you learned about the origin of marine mammals? What do you find particularly interesting in the first phylogenetic tree? Explain in detail.

4. The chicken and the turkey are both birds and have the same sequence of amino acids in their cytochrome-c protein. Explain how two different species can have identical cytochrome-c and still be different species. 5. Does the phylogenetic tree based on cytochrome c protein seem consistent with morphological features of these different organisms? Discuss with your classmates and do some additional research if you are unclear about characteristics of specific organisms. Cite any online or print sources you use. 6. If the molecular data provides the complete instruction manual for an organism, why are structural similarities among living organisms and in the fossil record still important in determining relationships between species? 7. One of the uses of comparative evolution is in epidemiology, tracing the source of an infectious agent. In the early 1990s a young woman in Florida died of AIDS, even though she had no known risk factors for HIV infection. Comparative analysis of the gene sequences for the HIV-1 outer-envelope protein from the woman, from the womans dentist, from other patients, and from a member the local community (as a control) determined that she had been infected by her dentist, who may have also infected other patients during invasive dental procedures. a. What are some of the ethical and legal aspects involved in a case such as this? b. Do you think it is always important to trace the origin of transmitted diseases? Why or why not? c. HIV is a virus that rapidly evolves. Would this make it easier or more difficult to maintain an accurate phylogeny of HIV strains in a population? d. How could sequence alignment be used to help study the global spread of influenza virus, which often includes new strains, each year?

You might also like