You are on page 1of 12

Determining DNA Sequence

Originally 2 methods were invented around 1976, but only one is widely used: the chain-termination method invented by Fred Sanger.
The other method is Maxam-Gilbert chemical degradation method, which is still used for specialized purposes, such as analyzing DNA-protein interactions.
More recently, several cheaper and faster alternatives have been invented. It is hard to know which of these methods, or possibly another method, will ultimately become standard. We will discus two of them: 454 pyrosequencing and Illumina/Solexa sequencing

Sanger Sequencing
Uses DNA polymerase to synthesize a second DNA strand that is labeled. DNA polymerase always adds new bases to the 3 end of a primer that is base-paired to the template DNA. DNA polymerase is modified to eliminate its editing function Also uses chain terminator nucleotides: Uses special enzymes to synthesize fragments of DNA that terminate when a selected base appears in the stretch of DNA being sequenced, Dideoxy Nucleotides (ddNTPs), which lack the -OH group on the 3' carbon of the deoxyribose. When DNA polymerase inserts one of these ddNTPs into the growing DNA chain, the chain terminates, as nothing can be added to its 3' end.

Sequencing Reaction
The template DNA is usually single stranded DNA, which can be produced from plasmid cloning vectors that contain the origin of replication from a single stranded bacteriophage such as M13 or fd. The primer is complementary to the region in the vector adjacent to the multiple cloning site. Sequencing is done by having 4 separate reactions, one for each DNA base. All 4 reactions contain the 4 normal dNTPs, but each reaction also contains one of the ddNTPs. In each reaction, DNA polymerase starts creating the second strand beginning at the primer. When DNA polymerase reaches a base for which some ddNTP is present, the chain will either: terminate if a ddNTP is added, or: continue if the corresponding dNTP is added. which one happens is random, based on ratio of dNTP to ddNTP in the tube. However, all the second strands in, say, the A tube will end at some A base: you get a collection of DNAs that end at each of the A's in the region being sequenced.

Electrophoresis

The newly synthesized DNA from the 4 reactions is then run (in separate lanes) on an electrophoresis gel. The DNA bands fall into a ladderlike sequence, spaced one base apart. The actual sequence can be read from the bottom of the gel up.
Automated sequencers use 4 different fluorescent dyes as tags attached to the dideoxy nucleotides and run all 4 reactions in the same lane of the gel. Todays sequencers use capillary electrophoresis instead of slab gels. Radioactive nucleotides (32P) are used for non-automated sequencing.

Sequencing reactions usually produce about 500-1000 bp of good sequence.

DNA sequencing example


Problem Statement: Consider the following DNA sequence (from firefly luciferase). Draw the sequencing gel pattern that forms as a result of sequencing the following template DNA with ddNTP as the capper. atgaccatgattacg...
Solution:

Given DNA template: DNA synthesized:

5'-atgaccatgattacg...-3' 3'-tactggtactaatgc(ddCTP)...-5'

DNA sequencing example


Given DNA template: 5'-atgaccatgattacg...-3' DNA synthesized: 3'-tactggtactaatgc...-5' Gel pattern: +-------------------------+ lane ddATP |W | | || | lane ddTTP |W| | | | | | lane ddCTP |W | | | | lane ddGTP |W || | | +-------------------------+ Electric Field + Decreasing size where "W" indicates the well position, and "|" denotes the DNA bands on the sequencing gel.

This picture is a radiograph. The dark color of the lines is proportional to the radioactivity from 32P labeled adenonsine in the transcribed DNA sample.

Reading a sequencing gel


+ Beginning from the right i.e. , the smallest DNA fragments. + The sequence read will be read in the 5'-3' direction.
This sequence will be exactly the same as the RNA that would be generated to encode a protein. + As an example, in the problem given, the smallest DNA fragment on the sequencing gel is in the C lane, so the first base is a C. The next largest band is in the G lane and so on. Therefore the sequence of the first two bases is CG. The sequence of the first 30 or so bases of the DNA are: CGTAATCATGGTCATATGAAGCTGGGCCGGGCCGTGC....
When this is made as RNA, its sequence would be:

CGUAAUCATGGUCAUAUGAAGCUGGGCCGGGCCGUGC....
Note that the information content is the same, only the T's have been replaced by U's!.

Translating the DNA sequence


The order of amino acids in any protein is specified by the order of nucleotide bases in the DNA. Each amino acid is coded by the particular sequence of three bases. To convert a DNA sequence First, find the starting codon. The starting codon is always the codon for the amino acid methionine. This codon is AUG in the RNA (or ATG in the DNA): GCGCGGGUCCGGGCAUGAAGCUGGGCCGGGCCGUGC.... Met In the given example the next codon is AAG. The first base (5'end) is A, so that selects the 3rd major row of the table. The second base (middle base) is A, so that selects the 3rd column of the table. The last base of the codon is G, selecting the last line in the block of four.

The codon table


5-Base U(=T) U(=T) Phe Phe Leu Leu Leu Leu Leu Leu Ile Ile Ile Met Val Val Val Val Middle C Ser Ser Ser Ser Pro Pro Pro Pro Thr Thr Thr Thr Ala Ala Ala Ala Base A Tyr Tyr Term Term His His Gln Gln Asn Asn Lys Lys Asp Asp Glu Glu 3-Base G Cys Cys Term Trp Arg Arg Arg Arg Ser Ser Arg Arg Gly Gly Gly Gly U(=T) C A G U(=T) C A G U(=T) C A G U(=T) C A G

Translating the DNA sequence


This entry AAG in the table is Lysine (Lys). Therefore the second amino acid is Lysine. The first few residues, and their DNA sequence, are as follows (color coded to indicate the correct location in the codon table): Met Lys Leu Gly Arg ... AUG AAG CUG GGC CGG GCC GUG C.. This procedure is exactly what cells do when they synthesize proteins based on the mRNA sequence. The process of translation in cells occurs in a large complex called the ribosome.

Automated procedure for DNA sequencing

A computer read-out of the gel generates a false color image where each color corresponds to a base. Then the intensities are translated into peaks that represent the sequence.

You might also like