You are on page 1of 51

Transcription

y RNA molecules are produced by copying part of

the nucleotide sequence of DNA into complementary sequence in RNA, a process called transcription. y During transcription, RNA polymerase binds to DNA and separates the DNA strands. RNA polymerase then uses one strand of DNA as a template from which nucleotides are assembled into a strand of mRNA.

How Does it Work?


y RNA Polymerase looks for a region on the DNA known

as a promoter, where it binds and begins transcription. y RNA strands are then edited. Some parts are removed (introns) - which are not expressed and other that are left are called exons or expressed genes.

Translation
y During translation, the cell uses information from

messenger RNA to produce proteins. y A Transcription occurs in nucleus. y B mRNA moves to the cytoplasm then to the ribosomes. tRNA read the mRNA and obtain the amino acid coded for. y C Ribosomes attach amino acids together forming a polypeptide chain. y D Polypeptide chain keeps growing until a stop codon is reached.

Central Dogma of Molecular Biology


y The flow of information in the cell

starts at DNA, which replicates to form more DNA. Information is then transcribed into RNA, and then it is translated into protein. The proteins do most of the work in the cell. y Information does not flow in the other direction. This is a molecular version of the incorrectness of inheritance of acquired characteristics . Changes in proteins do not affect the DNA in a systematic manner (although they can cause random changes in DNA.

Reverse Transcription Dogma exist. y However, a few exceptions to the Central


y Most importantly, some RNA viruses, called retroviruses make a DNA copy of themselves using the enzyme reverse transcriptase. The DNA copy incorporates into one of the chromosomes and becomes a permanent feature of the genome. The DNA copy inserted into the genome is called a provirus . This represents a flow of information from RNA to DNA. y Closely related to retroviruses are retrotransposons , sequences of DNA that make RNA copies of themselves, which then get reverse-transcribed into DNA that inserts into new locations in the genome. Unlike retroviruses, retrotransposons always remain within the cell. They lack genes to make the protein coat that surrounds viruses.

Transcription
y Transcription is the process of making an RNA copy of a single gene. Genes are y y y y

specific regions of the DNA of a chromosome. The enzyme used in transcription is RNA polymerase . There are several forms of RNA polymerase. In eukaryotes, most genes are transcribed by RNA polymerase 2. The raw materials for the new RNA are the 4 ribonucleoside triphosphates: ATP, CTP, GTP, and UTP. It s the same ATP as is used for energy in the cell. As with DNA replication, transcription proceeds 5- to 3 : new bases are added to the free 3 OH group. Unlike replication, transcription does not need to build on a primer. Instead, transcription starts at a region of DNA called a promoter . For protein-coding genes, the promoter is located a few bases 5 to (upstream from) the first base that is transcribed into RNA. Promoter sequences are very similar to each other, but not identical. If many promoters are compared, a consensus sequence can be derived. All promoters would be similar to this consensus sequence, but not necessarily identical.

After Transcription
y In prokaryotes, the RNA copy of a gene is messenger RNA, ready to be translated into protein. In fact, translation starts even before transcription is finished. y In eukaryotes, the primary RNA transcript of a gene needs further processing before it can be translated. This step is called RNA processing . Also, it needs to be transported out of the nucleus into the cytoplasm. y Steps in RNA processing:
y 1. Add a cap to the 5 end y 2. Add a poly-A tail to the 3 end y 3. splice out introns.

Introns
y Introns are regions within a gene that don t code for protein and don t

appear in the final mRNA molecule. Protein-coding sections of a gene (called exons) are interrupted by introns. y The function of introns remains unclear. They may help is RNA transport or in control of gene expression in some cases, and they may make it easier for sections of genes to be shuffled in evolution. But , no generally accepted reason for the existence of introns exists. y There are a few prokaryotic examples, but most introns are found in eukaryotes. y Some genes have many long introns: the dystrophin gene (mutants cause muscular dystrophy) has more than 70 introns that make up more than 99% of the gene s sequence. However, not all eukaryotic genes have introns: histone genes, for example, lack introns.

Summary of RNA processing


y y y y y

In eukaryotes, RNA polymerase produces a primary transcript , an exact RNA copy of the gene. A cap is put on the 5 end. The RNA is terminated and poly-A is added to the 3 end. All introns are spliced out. At this point, the RNA can be called messenger RNA. It is then transported out of the nucleus into the cytoplasm, where it is translated.

Translation
y Translation of mRNA into protein is accomplished by the ribosome, an RNA/protein hybrid. Ribosomes are composed of 2 subunits, large and small. y Ribosomes bind to the translation initiation sequence on the mRNA, then move down the RNA in a 5 to 3 direction, creating a new polypeptide. The first amino acid on the polypeptide has a free amino group, so it is called the Nterminal . The last amino acid in a polypeptide has a free acid group, so it is called the C-terminal . y Each group of 3 nucleotides in the mRNA is a codon , which codes for 1 amino acids. Transfer RNA is the adapter between the 3 bases of the codon and the corresponding amino acid.

Initiation ribosomes bind to specific translation of Translation y In prokaryotes,

initiation sites. There can be several different initiation sites on a messenger RNA: a prokaryotic mRNA can code for several different proteins. Translation begins at an AUG codon, or sometimes a GUG. The modified amino acid Nformyl methionine is always the first amino acid of the new polypeptide. y In eukaryotes, ribosomes bind to the 5 cap, then move down the mRNA until they reach the first AUG, the codon for methionine. Translation starts from this point. Eukaryotic mRNAs code for only a single gene. (Although there are a few exceptions, mainly among the eukaryotic viruses). y Note that translation does not start at the first base of the mRNA. There is an untranslated region at the beginning of the mRNA, the 5 untranslated region (5 UTR).

More Initiation
y The initiation process involves first joining the mRNA, the initiator methionine-tRNA, and the small ribosomal subunit. Several initiation factors -additional proteins--are also involved. The large ribosomal subunit then joins the complex.

Elongation
y The ribosome has 2 sites for tRNAs, called P and A. The initial tRNA

with attached amino acid is in the P site. A new tRNA, corresponding to the next codon on the mRNA, binds to the A site. The ribosome catalyzes a transfer of the amino acid from the P site onto the amino acid at the A site, forming a new peptide bond. y The ribosome then moves down one codon. The now-empty tRNA at the P site is displaced off the ribosome, and the tRNA that has the growing peptide chain on it is moved from the A site to the P site.
y The process is then repeated: y the tRNA at the P site holds the peptide chain, and a new tRNA binds to the A site. y the peptide chain is transferred onto the amino acid attached to the A site tRNA. y the ribosome moves down one codon, displacing the empty P site tRNA and moving the tRNA with the peptide chain from the A site to the P site.

Elongation

Post-Translational Modification y New polypeptides usually fold themselves spontaneously

into their active conformation. However, some proteins are helped and guided in the folding process by chaperone proteins y Many proteins have sugars, phosphate groups, fatty acids, and other molecules covalently attached to certain amino acids. Most of this is done in the endoplasmic reticulum. y Many proteins are targeted to specific organelles within the cell. Targeting is accomplished through signal sequences on the polypeptide. In the case of proteins that go into the endoplasmic reticulum, the signal seqeunce is a group of amino acids at the N terminal of the polypeptide, which are removed from the final protein after translation.

CS 177

DNA, RNA, protein overview

DNA RNA Mutations Amino acids, protein structure

DNA, RNA, protein overview

Questions about the genome in an organism: How much DNA, how many nucleotides? How many genes are there? What types of proteins appear to be coded by these genes?

Questions about the proteome: What proteins are present? Where are they?
DNA RNA Mutations Amino acids, protein structure

When are they present - under what conditions?

DNA, RNA, protein overview

Lecture 2 * DNA and its components * RNA and its components * Mutations * Amino acids, review of protein structure

DNA RNA Mutations Amino acids, protein structure

Linking nucleotides
5

Hydrogen bonds
3

N-H------N N-H------O 3 3 The 3 -OH of one nucleotide is linked to the 5 -phosphate of the next nucleotide What next? Linking nucleotides:
3

Thymine
3

2nm
3

Adenine
3

DNA RNA Mutations Amino acids, protein structure


3 5 3

Cytosine

Guanine

Base pairing
5

A T
3

C G A T
3 3

Base pairing (Watson-Crick): A/T (2 hydrogen bonds) G/C (3 hydrogen bonds)

Always pairing a purine and a pyrimidine yields a constant width

DNA base composition: A + G = T + C (Chargaff s rule)

T A
3

DNA RNA Mutations Amino acids, protein structure


3 3

C G
5

DNA conventions

1. DNA is a right-handed helix

DNA RNA Mutations Amino acids, protein structure

DNA conventions

1. DNA is a right-handed helix 2. The 5 end is to the left by convention 5 -ATCGCAATCAGCTAGGTT3 3 -TAGCGTTAGTCGATCCAA5 sense (forward) antisense (reverse)

DNA RNA

Amino acids, protein structure

3 -TAGCGTTAGTCGATCCAA - 5

Mutations

5 -ATCGCAATCAGCTAGGTT - 3

5 -ATCGCAATCAGCTAGGTT-3 3 -TAGCGTTAGTCGATCCAA-5

DNA overview

DNA deoxyribonucleic acid 4 bases A = T = C = G = Adenine Thymine Cytosine Guanine Nucleotide base
OOH
DNA RNA Mutations Amino acids, protein structure 5 CH2 4

Pyrimidine (C4N2H4)

Purine (C5N4H4)

Nucleoside base + sugar (deoxyribose)

+ sugar
--

+ phosphate

O- PO 4 O P O
1

O H
2

H
3 OH

H H

Numbering of carbons?

sugar

DNA structure

Some more facts: 1. Forces stabilizing DNA structure: Watson-Crick-H-bonding and base stacking (planar aromatic bases overlap geometrically and electronically p energy gain) 2. Genomic DNAs are large molecules: Eschericia coli: 4.7 x 106 bp; ~ 1 mm contour length Human: 3.2 x 109 bp; ~ 1 m contour length 3. Some DNA molecules (plasmids) are circular and have no free ends: mtDNA bacterial DNA (only one circular chromosome) 4. Average gene of 1000 bp can code for average protein of about 330 amino acids 5. Percentage of non-coding DNA varies greatly among organisms Organism small virus typical virus bacterium yeast human amphibians plants # Base pairs 4 x 103 3 x 105 5 x 106 1 x 107 6000 3.2 x 109 < 80 x 109 < 900 x 109 # Genes 3 200 3000 > 50% 30,000? ? 23,000 - >50,000 > 99% Non-coding DNA very little very little 10 - 20% 99% ?

DNA RNA Mutations Amino acids, protein structure

RNA structure
RNA ribonucleic acid 3 major types of RNA
messenger RNA (mRNA); template for protein synthesis transfer RNA (tRNA); adaptor molecules that decode the genetic code ribosomal RNA (rRNA); catalyzing the synthesis of proteins

4 bases A = U = C = G = Adenine Uracil Cytosine Guanine

Pyrimidine (C4N2H4)

Purine (C5N4H4)

Thymine (DNA) Nucleoside base + sugar (ribose)

Uracil (RNA) Nucleotide base


O-

+ sugar
--

+ phosphate

DNA RNA Mutations Amino acids, protein structure

OH
5 CH2 4

O- PO 4 O P O
1

O H

H
3 OH

H
2 OH

sugar

Base interactions in RNA

Base pairing: U/A/(T) (2 hydrogen bonds) G/C (3 hydrogen bonds)

RNA base composition: A+G=U+C / Chargaff s rule does not apply (RNA usually prevails as single strand)

DNA RNA Mutations Amino acids, protein structure

RNA structure: - usually single stranded - many self-complementary regions p RNA commonly exhibits an intricate secondary structure (relatively short, double helical segments alternated with single stranded regions) - complex tertiary interactions fold the RNA in its final three dimensional form - the folded RNA molecule is stabilized by interactions (e.g. hydrogen bonds and base stacking)

RNA structure

Primary structure

A) single stranded regions formed by unpaired nucleotides

Secondary structure C

B) duplex double helical RNA (A-form with 11 bp per turn) C) hairpin duplex bridged by a loop of unpaired nucleotides D) internal loop

D E F
DNA RNA Mutations Amino acids, protein structure

nucleotides not forming Watson-Crick base pairs

E) bulge loop unpaired nucleotides in one strand, other strand has contiguous base pairing F) junction

B A

three or more duplexes separated by single stranded regions G) pseudoknot tertiary interaction between bases of hairpin loop and outside bases

RNA structure

Primary structure

Secondary structure C

Tertiary structure

D E F
DNA RNA Mutations Amino acids, protein structure

B A

RNA structure
How to predict RNA secondary/tertiary structure?
Probing RNA structure experimentally: - physical methods (single crystal X-ray diffraction, electron microscopy) - chemical and enzymatic methods - mutational analysis (introduction of specific mutations to test change in some function or protein-RNA interaction)

Thermodynamic prediction of RNA structure: - RNA molecules comply to the laws of thermodynamics, therefore it should be possible to deduce RNA structure from its sequence by finding the conformation with the lowest free energy - Pros: only one sequence required; no difficult experiments; does not rely on alignments - Cons: thermodynamic data experimentally determined, but not always accurate; possible interactions of RNA with solvent, ions, and proteins

DNA RNA Mutations Amino acids, protein structure

Comparative determination of RNA structure: - basic assumption: secondary structure of a functional RNA will be conserved in the evolution of the molecule (at least more conserved than the primary structure); when a set of homologous sequences has a certain structure in common, this structure can be deduced by comparing the structures possible from their sequences - Pros: very powerful in finding secondary structure, relatively easy to use, only sequences required, not affected by interactions of the RNA and other molecules - Cons: large number of sequences to study preferred, structure constrains in fully conserved regions cannot be inferred, extremely variable regions cause problems with alignment

Amino acids/proteins

The central dogma of modern biology: DNA p RNA p protein


Getting from DNA to protein: Two parts: 1. Transcription in which a short portion of chromosomal DNA is used to make a RNA molecule small enough to leave the nucleus. 2. Translation in which the RNA code is used to assemble the protein at the ribosome The genetic code - The code problem: 4 nucleotides in RNA, but 20 amino acids in proteins - Bases are read in groups of 3 (= a codon) - The code consists of codons 64 (43 = 64)

- All codons are used in protein synthesis: - 20 amino acids - 3 stop codons - AUG (methionine) is the start codon (also used internally)
DNA RNA Mutations Amino acids, protein structure

- The code is non-overlapping and punctuation-free - The code is degenerate (but NOT ambiguous): each amino acid is specified by at least one codon - The code is universal (virtually all organisms use the same code)

The genetic code

Base 2

T
Phenylalanine F

C
Serine S

A
Tyrosine Y STOP Histidine H Glutamine Q Asparagine N Lysine K Aspartate B Glutamate Z

G
Cysteine C STOP Tryptophan W

T
Leucine L

T C A G T C A G T C A G T C A G

In-class exercise 1. Which amino acids are specified by single codons? methionine and tryptophan 2. How many amino acids are specified by the first two nucleotides only? five: proline, threonine, valine, alanine, glycine

C
Base 1

Leucine L

Proline P

Arginine R

Base 3

Isoleucine I Methionine M

Threonine T

Serine S Arginine R

3. What is the RNA code for the start codon? AUG

DNA RNA Mutations Amino acids, protein structure

Valine V

Alanine A

Glycine G

Amino acids

Hydrophobic

DNA RNA Mutations Amino acids, protein structure

Amino acids

Hydrophyllic

DNA RNA Mutations Amino acids, protein structure

Mutations

A single amino acid substitution in a protein causes sickle-cell disease

DNA RNA Mutations Amino acids, protein structure

Review of protein structure

DNA RNA Mutations Amino acids, protein structure

Making a polypeptide chain

Review of protein structure

Primary structure
Proteins are chains of amino acids joined by peptide bonds Polypeptide chain

The structure of two amid acids

The N-CE-C sequence is repeated throughout the protein, forming the backbone The bonds on each side of the CE atom are free to rotate within spatial constrains, the angles of these bonds determine the conformation of the protein backbone The R side chains also play an important structural role

DNA RNA Mutations Amino acids, protein structure

Review of protein structure


Secondary structure: Interactions that occur between the C=O and N-H groups on amino acids Much of the protein core comprises E helices and F sheets, folded into a three-dimensional configuration:
- regular patterns of H bonds are formed between neighboring amino acids - the amino acids have similar angles - the formation of these structures neutralizes the polar groups on each amino acid - the secondary structures are tightly packed in a hydrophobic environment - Each R side group has a limited volume to occupy and a limited number of interactions with other R side groups

E helix
DNA RNA Mutations Amino acids, protein structure

F sheet

Reading frames
Reading frame (also open reading frame): The stretch of triplet sequence of DNA that potentially encodes a protein. The reading frame is designated by the initiation or start codon and is terminated by a stop codon. - a reading frame is not always easily recognizable - each strand of RNA/DNA has three possible starting points (position one, two, or three): Position 1 CAG AUG AGG UCA GGC AUA gln met arg ser gly ile C AGA UGA GGU CAG GCA UA arg trp gly gln ala CA GAU GAG GUC AGG CAU A asp glu val arg his

Position 2

Position 3

- mutations within an open reading frame that delete or add nucleotides can disrupt the reading frame (frameshift mutation):
DNA RNA Mutations Amino acids, protein structure

Wildtype Mutant

CAG AUG AGG UCA GGC AUA GAG gln met arg ser gly ile glu CAG AUG AGU CAG GCA UAG AG gln met ser gln ala

Up to 30% of mutations causing humane disease are due to premature termination of translation (nonsense mutations or frameshift)

Mutations
Mutation: any heritable change in DNA

Sources of mutation: Spontaneous mutations: mutations occur for unknown reasons Induced mutations: exposure to substance (mutagen) known to cause mutations, e.g. X-rays, UV light, free radicals Mutations may influence one or several base pairs a) Nucleotide substitutions (point mutation) 1) Transitions (Pu m Pu; Py m Py) 2) Transversions (Pu m Py)

In-class exercise How many transition and transversion events are possible?

2 transitions: T m C; A m G b) Insertion or deletion ( indels ) 4 transversions: T m A; T m G - one to many bases can be involved C m A; C m G - frequently associated with repeated sequences ( hot spots ) - lead to frameshift in protein-coding genes, except when N = 3X - also caused by insertion of transposable elements into genes
DNA RNA Mutations Amino acids, protein structure

Weighting of mutation events plays important role for phylogenetic analyses (model of sequence evolution)

Mutations

Mutations may influence phenotype a) Silent (or synonymous) substitution - nucleotide substitution without amino acid change - no effect on phenotype - mostly third codon position - other possible silent substitutions: changes in non-coding DNA b) Replacement substitution - causes amino acid change - neutral: protein still functions normally - missense: protein loses some functions (e.g. sickle cell anemia: mutation in -globin) c) Sense/nonsense substitution - sense: involves a change from a termination codon to one that codes for an amino acid - nonsense: creates premature termination codon

DNA RNA Mutations Amino acids, protein structure

Mutation rates = a measure of the frequency of a given mutation per generation - mutation rates are usually given for specific loci (e.g. sickle cell anemia) - the rate of nucleotide substitutions in humans is on the order of 1 per 100,000,000 - range varies from 1 in 10,000 to 1 in 10,000,000,000 - every human has about 30 new mutations involving nucleotide substitutions - mutation rate is about twice as high in male as in female meiosis

Secondary structure

Other Secondary structure elements (no standardized classification) - random coil - loop

- others (e.g. 310 helix, F-hairpin, paperclip) Super-secondary structure


DNA RNA Mutations Amino acids, protein structure

- In addition to secondary structure elements that apply to all proteins (e.g. helix, sheet) there are some simple structural motifs in some proteins - These super-secondary structures (e.g. transmembrane domains, coiled coils, helix-turn-helix, signal peptides) can give important hints about protein function

Secondary structure

Structural classification of proteins (SCOP)

Class 1: mainly alpha

Class 2: mainly beta

Class 3: alpha/beta

Class 4: few secondary structures

DNA RNA Mutations Amino acids, protein structure

Secondary structure

Alternative SCOP

Class E : only E helices

Class F : antiparallel F sheets

Class E/F : mainly F sheets with intervening E helices

Class E+F : mainly segregated E helices with antiparallel F sheets

Membrane structure: hydrophobic E helices with membrane bilayers

Multidomain: contain more than one class

DNA RNA Mutations Amino acids, protein structure

Review of protein structure

Q: If we have all the Psi and Phi angles in a protein, do we then have enough information to describe the 3-D structure?

A: No, because the detailed packing of the amino acid side chains is not revealed from this information. However, the Psi and Phi angles do determine the entire secondary structure of a protein

DNA RNA Mutations Amino acids, protein structure

Tertiary structure

Tertiary structure

The tertiary structure describes the organization in three dimensions of all the atoms in the polypeptide

The tertiary structure is determined by a combination of different types of bonding (covalent bonds, ionic bonds, h-bonding, hydrophobic interactions, Van der Waal s forces) between the side chains

Many of these bonds are very week and easy to break, but hundreds or thousands working together give the protein structure great stability

DNA RNA Mutations Amino acids, protein structure

If a protein consists of only one polypeptide chain, this level then describes the complete structure

Tertiary structure

Proteins can be divided into two general classes based on their tertiary structure: - Fibrous proteins have elongated structure with the polypeptide chains arranged in long strands. This class of proteins serves as major structural component of cells Examples: silk, keratin, collagen

- Globular proteins have more compact, often irregular structures. This class of proteins includes most enzymes and most proteins involved in gene expression and regulation

DNA RNA Mutations Amino acids, protein structure

Quaternary structure

The quaternary structure defines the conformation assumed by a multimeric protein. The individual polypeptide chains that make up a multimeric protein are often referred to as protein subunits. Subunits are joined by ionic, H and hydrophobic interactions Example: Haemoglobin (4 subunits)

DNA RNA Mutations Amino acids, protein structure

Structure displays

Common displays are (among others) cartoon, spacefill, and backbone

cartoon
DNA RNA Mutations Amino acids, protein structure

spacefill

backbone

Summary protein structure

Primary structure: Sequence of amino acids

Secondary structure: Interactions that occur between the C=O and N-H groups on amino acids

Tertiary structure: Organization in three dimensions of all the atoms in the polypeptide

Quaternary structure:
DNA RNA Mutations Amino acids, protein structure

Conformation assumed by a multimeric protein

The four levels of protein structure are hierarchical: each level of the build process is dependent upon the one below it

Next week

First quiz Lecture 1 - Bioinformatics definitions - The human genome project

Lecture 2 - DNA structure - RNA structure - Mutations - Amino acids - Proteins

DNA RNA Mutations Amino acids, protein structure

You might also like