Protein Synthesis

Transcription
y RNA molecules are produced by copying part of
the nucleotide sequence of DNA into complementary sequence in RNA, a process called transcription. y During transcription, RNA polymerase binds to DNA and separates the DNA strands. RNA polymerase then uses one strand of DNA as a template from which nucleotides are assembled into a strand of mRNA.
How Does it Work?

y RNA Polymerase looks for a region on the DNA known
as a promoter, where it binds and begins transcription. y RNA strands are then edited. Some parts are removed (introns) - which are not expressed and other that are left are called exons or expressed genes.
Translation
y During translation, the cell uses information from
messenger RNA to produce proteins. y A Transcription occurs in nucleus. y B mRNA moves to the cytoplasm then to the ribosomes. tRNA read the mRNA and obtain the amino acid coded for. y C Ribosomes attach amino acids together forming a polypeptide chain. y D Polypeptide chain keeps growing until a stop codon is reached.
Central Dogma of Molecular Biology

y The flow of information in the cell
starts at DNA, which replicates to form more DNA. Information is then transcribed into RNA, and then it is translated into protein. The proteins do most of the work in the cell. y Information does not flow in the other direction. This is a molecular version of the incorrectness of inheritance of acquired characteristics . Changes in proteins do not affect the DNA in a systematic manner (although they can cause random changes in DNA.
Reverse Transcription Dogma exist. y However, a few exceptions to the Central

y Most importantly, some RNA viruses, called retroviruses make a DNA copy of themselves using the enzyme reverse transcriptase. The DNA copy incorporates into one of the chromosomes and becomes a permanent feature of the genome. The DNA copy inserted into the genome is called a provirus . This represents a flow of information from RNA to DNA. y Closely related to retroviruses are retrotransposons , sequences of DNA that make RNA copies of themselves, which then get reverse-transcribed into DNA that inserts into new locations in the genome. Unlike retroviruses, retrotransposons always remain within the cell. They lack genes to make the protein coat that surrounds viruses.
Transcription
y Transcription is the process of making an RNA copy of a single gene. Genes are y y y y
specific regions of the DNA of a chromosome. The enzyme used in transcription is RNA polymerase . There are several forms of RNA polymerase. In eukaryotes, most genes are transcribed by RNA polymerase 2. The raw materials for the new RNA are the 4 ribonucleoside triphosphates: ATP, CTP, GTP, and UTP. It s the same ATP as is used for energy in the cell. As with DNA replication, transcription proceeds 5- to 3 : new bases are added to the free 3 OH group. Unlike replication, transcription does not need to build on a primer. Instead, transcription starts at a region of DNA called a promoter . For protein-coding genes, the promoter is located a few bases 5 to (upstream from) the first base that is transcribed into RNA. Promoter sequences are very similar to each other, but not identical. If many promoters are compared, a consensus sequence can be derived. All promoters would be similar to this consensus sequence, but not necessarily identical.
After Transcription
y In prokaryotes, the RNA copy of a gene is messenger RNA, ready to be translated into protein. In fact, translation starts even before transcription is finished. y In eukaryotes, the primary RNA transcript of a gene needs further processing before it can be translated. This step is called RNA processing . Also, it needs to be transported out of the nucleus into the cytoplasm. y Steps in RNA processing:
y 1. Add a cap to the 5 end y 2. Add a poly-A tail to the 3 end y 3. splice out introns.
Introns
y Introns are regions within a gene that don t code for protein and don t
appear in the final mRNA molecule. Protein-coding sections of a gene (called exons) are interrupted by introns. y The function of introns remains unclear. They may help is RNA transport or in control of gene expression in some cases, and they may make it easier for sections of genes to be shuffled in evolution. But , no generally accepted reason for the existence of introns exists. y There are a few prokaryotic examples, but most introns are found in eukaryotes. y Some genes have many long introns: the dystrophin gene (mutants cause muscular dystrophy) has more than 70 introns that make up more than 99% of the gene s sequence. However, not all eukaryotic genes have introns: histone genes, for example, lack introns.
Summary of RNA processing

y y y y y
In eukaryotes, RNA polymerase produces a primary transcript , an exact RNA copy of the gene. A cap is put on the 5 end. The RNA is terminated and poly-A is added to the 3 end. All introns are spliced out. At this point, the RNA can be called messenger RNA. It is then transported out of the nucleus into the cytoplasm, where it is translated.
Translation
y Translation of mRNA into protein is accomplished by the ribosome, an RNA/protein hybrid. Ribosomes are composed of 2 subunits, large and small. y Ribosomes bind to the translation initiation sequence on the mRNA, then move down the RNA in a 5 to 3 direction, creating a new polypeptide. The first amino acid on the polypeptide has a free amino group, so it is called the Nterminal . The last amino acid in a polypeptide has a free acid group, so it is called the C-terminal . y Each group of 3 nucleotides in the mRNA is a codon , which codes for 1 amino acids. Transfer RNA is the adapter between the 3 bases of the codon and the corresponding amino acid.
Initiation ribosomes bind to specific translation of Translation y In prokaryotes,
initiation sites. There can be several different initiation sites on a messenger RNA: a prokaryotic mRNA can code for several different proteins. Translation begins at an AUG codon, or sometimes a GUG. The modified amino acid Nformyl methionine is always the first amino acid of the new polypeptide. y In eukaryotes, ribosomes bind to the 5 cap, then move down the mRNA until they reach the first AUG, the codon for methionine. Translation starts from this point. Eukaryotic mRNAs code for only a single gene. (Although there are a few exceptions, mainly among the eukaryotic viruses). y Note that translation does not start at the first base of the mRNA. There is an untranslated region at the beginning of the mRNA, the 5 untranslated region (5 UTR).
More Initiation
y The initiation process involves first joining the mRNA, the initiator methionine-tRNA, and the small ribosomal subunit. Several initiation factors -additional proteins--are also involved. The large ribosomal subunit then joins the complex.
Elongation
y The ribosome has 2 sites for tRNAs, called P and A. The initial tRNA
with attached amino acid is in the P site. A new tRNA, corresponding to the next codon on the mRNA, binds to the A site. The ribosome catalyzes a transfer of the amino acid from the P site onto the amino acid at the A site, forming a new peptide bond. y The ribosome then moves down one codon. The now-empty tRNA at the P site is displaced off the ribosome, and the tRNA that has the growing peptide chain on it is moved from the A site to the P site.
y The process is then repeated: y the tRNA at the P site holds the peptide chain, and a new tRNA binds to the A site. y the peptide chain is transferred onto the amino acid attached to the A site tRNA. y the ribosome moves down one codon, displacing the empty P site tRNA and moving the tRNA with the peptide chain from the A site to the P site.
Elongation
Post-Translational Modification y New polypeptides usually fold themselves spontaneously
into their active conformation. However, some proteins are helped and guided in the folding process by chaperone proteins y Many proteins have sugars, phosphate groups, fatty acids, and other molecules covalently attached to certain amino acids. Most of this is done in the endoplasmic reticulum. y Many proteins are targeted to specific organelles within the cell. Targeting is accomplished through signal sequences on the polypeptide. In the case of proteins that go into the endoplasmic reticulum, the signal seqeunce is a group of amino acids at the N terminal of the polypeptide, which are removed from the final protein after translation.
CS 177
DNA, RNA, protein overview
DNA RNA Mutations Amino acids, protein structure
Questions about the genome in an organism: How much DNA, how many nucleotides? How many genes are there? What types of proteins appear to be coded by these genes?
Questions about the proteome: What proteins are present? Where are they?
When are they present - under what conditions?
Lecture 2 * DNA and its components * RNA and its components * Mutations * Amino acids, review of protein structure
Linking nucleotides
5
Hydrogen bonds
3
N-H------N N-H------O 3 3 The 3 -OH of one nucleotide is linked to the 5 -phosphate of the next nucleotide What next? Linking nucleotides:
3
Thymine
3
2nm
3
Adenine
3

3 5 3
Cytosine
Guanine
Base pairing
5
A T
3
C G A T
3 3
Base pairing (Watson-Crick): A/T (2 hydrogen bonds) G/C (3 hydrogen bonds)
Always pairing a purine and a pyrimidine yields a constant width
DNA base composition: A + G = T + C (Chargaff s rule)
T A
3

3 3
C G
5
DNA conventions
1. DNA is a right-handed helix
DNA conventions
1. DNA is a right-handed helix 2. The 5 end is to the left by convention 5 -ATCGCAATCAGCTAGGTT3 3 -TAGCGTTAGTCGATCCAA5 sense (forward) antisense (reverse)
DNA RNA
Amino acids, protein structure
3 -TAGCGTTAGTCGATCCAA - 5
Mutations
5 -ATCGCAATCAGCTAGGTT - 3
5 -ATCGCAATCAGCTAGGTT-3 3 -TAGCGTTAGTCGATCCAA-5
DNA overview
DNA deoxyribonucleic acid 4 bases A = T = C = G = Adenine Thymine Cytosine Guanine Nucleotide base
OOH
DNA RNA Mutations Amino acids, protein structure 5 CH2 4
Pyrimidine (C4N2H4)
Purine (C5N4H4)
Nucleoside base + sugar (deoxyribose)
+ sugar
--
+ phosphate
O- PO 4 O P O
1
O H
2
H
3 OH
H H
Numbering of carbons?
sugar
DNA structure
Some more facts: 1. Forces stabilizing DNA structure: Watson-Crick-H-bonding and base stacking (planar aromatic bases overlap geometrically and electronically p energy gain) 2. Genomic DNAs are large molecules: Eschericia coli: 4.7 x 106 bp; ~ 1 mm contour length Human: 3.2 x 109 bp; ~ 1 m contour length 3. Some DNA molecules (plasmids) are circular and have no free ends: mtDNA bacterial DNA (only one circular chromosome) 4. Average gene of 1000 bp can code for average protein of about 330 amino acids 5. Percentage of non-coding DNA varies greatly among organisms Organism small virus typical virus bacterium yeast human amphibians plants # Base pairs 4 x 103 3 x 105 5 x 106 1 x 107 6000 3.2 x 109 < 80 x 109 < 900 x 109 # Genes 3 200 3000 > 50% 30,000? ? 23,000 - >50,000 > 99% Non-coding DNA very little very little 10 - 20% 99% ?
RNA structure
RNA ribonucleic acid 3 major types of RNA
messenger RNA (mRNA); template for protein synthesis transfer RNA (tRNA); adaptor molecules that decode the genetic code ribosomal RNA (rRNA); catalyzing the synthesis of proteins
4 bases A = U = C = G = Adenine Uracil Cytosine Guanine
Pyrimidine (C4N2H4)
Purine (C5N4H4)
Thymine (DNA) Nucleoside base + sugar (ribose)
Uracil (RNA) Nucleotide base

O-
+ sugar
--
+ phosphate
OH
5 CH2 4
O- PO 4 O P O
1
O H
H
3 OH
H
2 OH
sugar
Base interactions in RNA
Base pairing: U/A/(T) (2 hydrogen bonds) G/C (3 hydrogen bonds)
RNA base composition: A+G=U+C / Chargaff s rule does not apply (RNA usually prevails as single strand)
RNA structure: - usually single stranded - many self-complementary regions p RNA commonly exhibits an intricate secondary structure (relatively short, double helical segments alternated with single stranded regions) - complex tertiary interactions fold the RNA in its final three dimensional form - the folded RNA molecule is stabilized by interactions (e.g. hydrogen bonds and base stacking)
RNA structure
Primary structure
A) single stranded regions formed by unpaired nucleotides
Secondary structure C
B) duplex double helical RNA (A-form with 11 bp per turn) C) hairpin duplex bridged by a loop of unpaired nucleotides D) internal loop
D E F
nucleotides not forming Watson-Crick base pairs
E) bulge loop unpaired nucleotides in one strand, other strand has contiguous base pairing F) junction
B A
three or more duplexes separated by single stranded regions G) pseudoknot tertiary interaction between bases of hairpin loop and outside bases
RNA structure
Primary structure
Secondary structure C
Tertiary structure
D E F
B A
RNA structure
How to predict RNA secondary/tertiary structure?
Probing RNA structure experimentally: - physical methods (single crystal X-ray diffraction, electron microscopy) - chemical and enzymatic methods - mutational analysis (introduction of specific mutations to test change in some function or protein-RNA interaction)
Thermodynamic prediction of RNA structure: - RNA molecules comply to the laws of thermodynamics, therefore it should be possible to deduce RNA structure from its sequence by finding the conformation with the lowest free energy - Pros: only one sequence required; no difficult experiments; does not rely on alignments - Cons: thermodynamic data experimentally determined, but not always accurate; possible interactions of RNA with solvent, ions, and proteins
Comparative determination of RNA structure: - basic assumption: secondary structure of a functional RNA will be conserved in the evolution of the molecule (at least more conserved than the primary structure); when a set of homologous sequences has a certain structure in common, this structure can be deduced by comparing the structures possible from their sequences - Pros: very powerful in finding secondary structure, relatively easy to use, only sequences required, not affected by interactions of the RNA and other molecules - Cons: large number of sequences to study preferred, structure constrains in fully conserved regions cannot be inferred, extremely variable regions cause problems with alignment
Amino acids/proteins
The central dogma of modern biology: DNA p RNA p protein

Getting from DNA to protein: Two parts: 1. Transcription in which a short portion of chromosomal DNA is used to make a RNA molecule small enough to leave the nucleus. 2. Translation in which the RNA code is used to assemble the protein at the ribosome The genetic code - The code problem: 4 nucleotides in RNA, but 20 amino acids in proteins - Bases are read in groups of 3 (= a codon) - The code consists of codons 64 (43 = 64)
- All codons are used in protein synthesis: - 20 amino acids - 3 stop codons - AUG (methionine) is the start codon (also used internally)
- The code is non-overlapping and punctuation-free - The code is degenerate (but NOT ambiguous): each amino acid is specified by at least one codon - The code is universal (virtually all organisms use the same code)
The genetic code
Base 2
T
Phenylalanine F
C
Serine S
A
Tyrosine Y STOP Histidine H Glutamine Q Asparagine N Lysine K Aspartate B Glutamate Z
G
Cysteine C STOP Tryptophan W
T
Leucine L
T C A G T C A G T C A G T C A G
In-class exercise 1. Which amino acids are specified by single codons? methionine and tryptophan 2. How many amino acids are specified by the first two nucleotides only? five: proline, threonine, valine, alanine, glycine
C
Base 1
Leucine L
Proline P
Arginine R
Base 3
Isoleucine I Methionine M
Threonine T
Serine S Arginine R
3. What is the RNA code for the start codon? AUG
Valine V
Alanine A
Glycine G
Amino acids
Hydrophobic
Amino acids
Hydrophyllic
Mutations
A single amino acid substitution in a protein causes sickle-cell disease
Review of protein structure
Making a polypeptide chain
Primary structure
Proteins are chains of amino acids joined by peptide bonds Polypeptide chain
The structure of two amid acids
The N-CE-C sequence is repeated throughout the protein, forming the backbone The bonds on each side of the CE atom are free to rotate within spatial constrains, the angles of these bonds determine the conformation of the protein backbone The R side chains also play an important structural role

Secondary structure: Interactions that occur between the C=O and N-H groups on amino acids Much of the protein core comprises E helices and F sheets, folded into a three-dimensional configuration:
- regular patterns of H bonds are formed between neighboring amino acids - the amino acids have similar angles - the formation of these structures neutralizes the polar groups on each amino acid - the secondary structures are tightly packed in a hydrophobic environment - Each R side group has a limited volume to occupy and a limited number of interactions with other R side groups
E helix
F sheet
Reading frames
Reading frame (also open reading frame): The stretch of triplet sequence of DNA that potentially encodes a protein. The reading frame is designated by the initiation or start codon and is terminated by a stop codon. - a reading frame is not always easily recognizable - each strand of RNA/DNA has three possible starting points (position one, two, or three): Position 1 CAG AUG AGG UCA GGC AUA gln met arg ser gly ile C AGA UGA GGU CAG GCA UA arg trp gly gln ala CA GAU GAG GUC AGG CAU A asp glu val arg his
Position 2
Position 3
- mutations within an open reading frame that delete or add nucleotides can disrupt the reading frame (frameshift mutation):
Wildtype Mutant
CAG AUG AGG UCA GGC AUA GAG gln met arg ser gly ile glu CAG AUG AGU CAG GCA UAG AG gln met ser gln ala
Up to 30% of mutations causing humane disease are due to premature termination of translation (nonsense mutations or frameshift)
Mutations
Mutation: any heritable change in DNA
Sources of mutation: Spontaneous mutations: mutations occur for unknown reasons Induced mutations: exposure to substance (mutagen) known to cause mutations, e.g. X-rays, UV light, free radicals Mutations may influence one or several base pairs a) Nucleotide substitutions (point mutation) 1) Transitions (Pu m Pu; Py m Py) 2) Transversions (Pu m Py)
In-class exercise How many transition and transversion events are possible?
2 transitions: T m C; A m G b) Insertion or deletion ( indels ) 4 transversions: T m A; T m G - one to many bases can be involved C m A; C m G - frequently associated with repeated sequences ( hot spots ) - lead to frameshift in protein-coding genes, except when N = 3X - also caused by insertion of transposable elements into genes
Weighting of mutation events plays important role for phylogenetic analyses (model of sequence evolution)
Mutations
Mutations may influence phenotype a) Silent (or synonymous) substitution - nucleotide substitution without amino acid change - no effect on phenotype - mostly third codon position - other possible silent substitutions: changes in non-coding DNA b) Replacement substitution - causes amino acid change - neutral: protein still functions normally - missense: protein loses some functions (e.g. sickle cell anemia: mutation in -globin) c) Sense/nonsense substitution - sense: involves a change from a termination codon to one that codes for an amino acid - nonsense: creates premature termination codon
Mutation rates = a measure of the frequency of a given mutation per generation - mutation rates are usually given for specific loci (e.g. sickle cell anemia) - the rate of nucleotide substitutions in humans is on the order of 1 per 100,000,000 - range varies from 1 in 10,000 to 1 in 10,000,000,000 - every human has about 30 new mutations involving nucleotide substitutions - mutation rate is about twice as high in male as in female meiosis
Secondary structure
Other Secondary structure elements (no standardized classification) - random coil - loop
- others (e.g. 310 helix, F-hairpin, paperclip) Super-secondary structure

- In addition to secondary structure elements that apply to all proteins (e.g. helix, sheet) there are some simple structural motifs in some proteins - These super-secondary structures (e.g. transmembrane domains, coiled coils, helix-turn-helix, signal peptides) can give important hints about protein function
Secondary structure
Structural classification of proteins (SCOP)
Class 1: mainly alpha
Class 2: mainly beta
Class 3: alpha/beta
Class 4: few secondary structures
Secondary structure
Alternative SCOP
Class E : only E helices
Class F : antiparallel F sheets
Class E/F : mainly F sheets with intervening E helices
Class E+F : mainly segregated E helices with antiparallel F sheets
Membrane structure: hydrophobic E helices with membrane bilayers
Multidomain: contain more than one class
Q: If we have all the Psi and Phi angles in a protein, do we then have enough information to describe the 3-D structure?
A: No, because the detailed packing of the amino acid side chains is not revealed from this information. However, the Psi and Phi angles do determine the entire secondary structure of a protein
Tertiary structure
Tertiary structure
The tertiary structure describes the organization in three dimensions of all the atoms in the polypeptide
The tertiary structure is determined by a combination of different types of bonding (covalent bonds, ionic bonds, h-bonding, hydrophobic interactions, Van der Waal s forces) between the side chains
Many of these bonds are very week and easy to break, but hundreds or thousands working together give the protein structure great stability
If a protein consists of only one polypeptide chain, this level then describes the complete structure
Tertiary structure
Proteins can be divided into two general classes based on their tertiary structure: - Fibrous proteins have elongated structure with the polypeptide chains arranged in long strands. This class of proteins serves as major structural component of cells Examples: silk, keratin, collagen
- Globular proteins have more compact, often irregular structures. This class of proteins includes most enzymes and most proteins involved in gene expression and regulation
Quaternary structure
The quaternary structure defines the conformation assumed by a multimeric protein. The individual polypeptide chains that make up a multimeric protein are often referred to as protein subunits. Subunits are joined by ionic, H and hydrophobic interactions Example: Haemoglobin (4 subunits)
Structure displays
Common displays are (among others) cartoon, spacefill, and backbone
cartoon
spacefill
backbone
Summary protein structure
Primary structure: Sequence of amino acids
Secondary structure: Interactions that occur between the C=O and N-H groups on amino acids
Tertiary structure: Organization in three dimensions of all the atoms in the polypeptide
Quaternary structure:
Conformation assumed by a multimeric protein
The four levels of protein structure are hierarchical: each level of the build process is dependent upon the one below it
Next week
First quiz Lecture 1 - Bioinformatics definitions - The human genome project
Lecture 2 - DNA structure - RNA structure - Mutations - Amino acids - Proteins

Protein Synthesis

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Protein Synthesis

Uploaded by

Copyright:

Available Formats

Transcription

y RNA molecules are produced by copying part of

How Does it Work?

Central Dogma of Molecular Biology

Reverse Transcription Dogma exist. y However, a few exceptions to the Central

Summary of RNA processing

Initiation ribosomes bind to specific translation of Translation y In prokaryotes,

Post-Translational Modification y New polypeptides usually fold themselves spontaneously

DNA, RNA, protein overview

DNA RNA Mutations Amino acids, protein structure

DNA, RNA, protein overview

When are they present - under what conditions?

DNA, RNA, protein overview

DNA RNA Mutations Amino acids, protein structure

DNA RNA Mutations Amino acids, protein structure

Base pairing (Watson-Crick): A/T (2 hydrogen bonds) G/C (3 hydrogen bonds)

Always pairing a purine and a pyrimidine yields a constant width

DNA base composition: A + G = T + C (Chargaff s rule)

DNA RNA Mutations Amino acids, protein structure

1. DNA is a right-handed helix

DNA RNA Mutations Amino acids, protein structure

Amino acids, protein structure

Nucleoside base + sugar (deoxyribose)

DNA RNA Mutations Amino acids, protein structure

4 bases A = U = C = G = Adenine Uracil Cytosine Guanine

Thymine (DNA) Nucleoside base + sugar (ribose)

Uracil (RNA) Nucleotide base

DNA RNA Mutations Amino acids, protein structure

Base interactions in RNA

Base pairing: U/A/(T) (2 hydrogen bonds) G/C (3 hydrogen bonds)

DNA RNA Mutations Amino acids, protein structure

A) single stranded regions formed by unpaired nucleotides

nucleotides not forming Watson-Crick base pairs

DNA RNA Mutations Amino acids, protein structure

The central dogma of modern biology: DNA p RNA p protein

The genetic code

3. What is the RNA code for the start codon? AUG

DNA RNA Mutations Amino acids, protein structure

DNA RNA Mutations Amino acids, protein structure

DNA RNA Mutations Amino acids, protein structure

A single amino acid substitution in a protein causes sickle-cell disease

DNA RNA Mutations Amino acids, protein structure

Review of protein structure

DNA RNA Mutations Amino acids, protein structure

Making a polypeptide chain

Review of protein structure

The structure of two amid acids

DNA RNA Mutations Amino acids, protein structure

Review of protein structure

DNA RNA Mutations Amino acids, protein structure

- others (e.g. 310 helix, F-hairpin, paperclip) Super-secondary structure

Structural classification of proteins (SCOP)

Class 1: mainly alpha

Class 2: mainly beta

Class 4: few secondary structures

DNA RNA Mutations Amino acids, protein structure

Class E : only E helices

Class F : antiparallel F sheets

Class E/F : mainly F sheets with intervening E helices

Class E+F : mainly segregated E helices with antiparallel F sheets

Membrane structure: hydrophobic E helices with membrane bilayers

Multidomain: contain more than one class

DNA RNA Mutations Amino acids, protein structure