You are on page 1of 59

Structure

of Nucleic Acid


5 end = phosphate group
3 end = OH group

linked by phosphodiester bond

Bases
-Purines: adenine and guanine
-Pyrimidines: cytosine, uracil and thymine

-Natural base pairing: A-T, C-G, A-U (in RNA only)
-In lab: G-T, C-T
-G-U commonly exits in double-helical regions of otherwise single- stranded RNA

C-G = 3 hydrogen bonds
A-T = 2 hydrogen bonds

DNA structure

-Major groove (good binding surface) and minor groove
-DNA in cells is right handed

-B DNA: found in cells,
-A DNA: RNA-DNA, RNA-RNA (wider and shorter
than B DNA)
-When H20 is removed B DNA becomes A DNA

-DNA is flexible about it axis (no H bonds // to
the axis unlike protein alpha helices)
-TBP bends DNA

Melting temperature (Tm): temperature at which half of the bases in DNA are denatured
-pH, ion [ ] and G-C:A-T affect Tm
-change in absorption is due to hyperchromicity
-Note that ssDNA absorption still increases b/c of self-binding (e.g. hairpin)

Denaturation and Renaturation

Denaturation of dsDNA can be achieved by raising its temperature, by reducing ionic
concentration (since positive ions shield negatively charged phosphate groups and
therefore decrease repulsive forces allowing renaturation), by extremes of pH, or by adding
agents that destabilize hydrogen bonds (formamide or urea).

ssDNA can renature into dsDNA when these conditions are reversed. Renaturation
depends upon base-pairing and thus the ssDNA strands must have complementary
sequence.

Nucleic acid hybridization depends upon these properties.



Topoisomerase I: relieves torsional strain


-binds to DNA, breaks phosphodiester bond in one strand (nick), causes loss of supercoil,
ligates the two ends of the broken strand (therefore it removes 1 twist)

Topoisomerase II: same as I but acts on 2 strands and therefore removes 2 twists (more
efficient but more dangerous)

Hairpin: 5-10 nucleotides


Stem-loop: 10 to several 100 nucleotides


Transcription Overview and mRNA formation





primary RNA transcript = RNA formed right after transcription (e.g. pre-mRNA)
Coding strand = non-template strand
Non-coding strand = template strand






-Multiple polymerases can transcribe the same template DNA strand at the same time
-Elongation complex is very stable; polymerase doesnt fall off until it reaches the stop site,
-Speed of elongation is approx. 1000 nt/min, small gene take min, big genes can take hours

o
o

TFIID composed of TATA box binding


Protein (TBP) and 13 TBP associated
factors (TAFs)
Whole complex that is ready to initiate
is called the preinitiation complex
TBP binds first. It bends the DNA. Then
TFIIB makes contact with TBP and DNA.
Then a preformed complex of TFIIF (4
subunits) and RNA pol II binds. It
positions Pol II over start site. TFIIE (4
subunits) comes in and creates
docking site for TFIIH (10 subunits).
TFIIH has 2 very important subunits
One contains helicase = uses ATP
to unwind DNA duplex at start site
FORMS OPEN COMPLEX
One contains kinase so it
phosphorylates the Pol II CTD
multiple times
Complex dissociates, except for TBP,
which remains on DNA.

Transcription initiation
requires formation of
the pre-initiation
complex.

The pre-initiation com


contains many genera
transcription factors (T
for factors associated
with RNA polymerase
in addition to RNA
polymerase.

Transcription initiation can


be studied in vitro (in a
test tube with defined
components), and in vivo

using genetic techniques.
In Vivo Transcription Initiation by RNA Polymerase II Requires Additional Proteins

Need TFIIA to associate with TBP and TATA box DNA. TFIIA interacts with part of TBP
upstream from the direction of transcription.
o The TAF subunits of TFIID function in initiating transcription from promoters that lack
a TATA box.
Elongation Factors Regulate the Initial Stages of Transcription in the Promoter-Proximal
Region
Key Concepts
o RNA pol II initiates transcription of genes at the nucleotide in the DNA template
that corresponds to the 5 nucleotide that is capped in the encoded mRNA
In metazoans, NELF associates with Pol II after initiation, inhibiting elongation about 50-200 bp
from the transcription start site. Inhibition of elongation is relieved by further phosphorylation.
o










Organization of genes is different in prokaryotes and eukaryotes



Glossary
Operon: in genetics, an operon is a functioning unit of genomic DNA containing a cluster
of genes under the control of a single regulatory signal or promoter.
Endonuclease: enzymes that cleave the phosphodiester bond within a polynucleotide
chain.
Exonuclease: enzymes that work by cleaving nucleotides one at a time from the end (exo)
of a polynucleotide chain.

-In prokaryotes, genes with a common function are often arranged linearly in operons and
transcribed together on a single mRNA. This will then be translated into several different
proteins that share similar functions. There are very few non-coding gaps of DNA in
prokaryotic genomes. Translation can occur at the same time as transcription.

-In eukaryotes, genes can be scattered over different chromosomes.

RNA processing:

-mRNA 5 cap (consists of a 7-methyl-G, 5-5 link) protects RNA from degradation
-poly A tail (100 to 250 A)


UTRs help tell other enzymes how to protect, transport mRNA

The same primary transcript can be alternatively spliced in different tissues allowing for
different proteins (isoforms).





Regulation of Prokaryotic Gene Expression



Promoter: region of DNA that initiates the transcription of a particular gene.

Regulation of the production of a protein usually occurs during transcription.
Transcription is either repressed or activated (either little or no mRNA is produced or up
to a 1000x more is synthesized).

Unicellular versus multicellular organisms

Single-celled organisms

Genes are regulated to adjust to changes in the nutritional and physical


environment. A cell usually produces only the proteins required for survival and
proliferation under the particular environmental conditions it experiences.


Multicellular organisms
Genes are regulated to ensure coordination during embryonic development and
tissue differentiation.

Sigma factor (): binds with RNA polymerase and is necessary to initiate transcription.
They recognize promoter DNA sequences and recruit RNA polymerase. After
transcription is initiated, sigma factors are released, as they are no longer required.
70 is most common; it recognizes TTGACATATAAT. 54 recognizes promoters of genes
involved in nitrogen metabolism; its consensus sequence is very different.

Operator: segment of DNA to which a transcription factor protein binds.

Consensus sequence: is the calculated order of most frequent residues, either nucleotide
or amino acid, found at each position in a sequence alignment.



Example: lac operon (genes necessary for the metabolism of lactose)

In the absence of lactose: lac repressor binds to operator and prevents pol- 70 from
binding with promoter. Transcription is said to be repressed.

In the presence of lactose: lactose binds to lac repressor changing its conformation and
causing it to dissociate from the operator. Hence, allowing pol- 70 to bind to promoter
sequence and initiate transcription. Transcription is said to be de-repressed.

Glucose is a better energy source than lactose. When glucose levels are low, metabolism of
lactose is favored. In response to low levels of glucose, E.coli synthesize cyclic AMP (cAMP)
which binds to, and activates, a transcriptional activator protein called CAP. CAP binds
to the CAP site and interacts with RNA polymerase increasing the rate of transcription.





RNA Polymerases

Three eukaryotic RNA polymerases:



RNA polymerase I
Located in the nucleolus

Transcribes only precursor rRNA


RNA polymerase II
Transcribes mRNAs and four snRNAs that take part in RNA splicing
Its largest subunit has a CTD tail which needs to be phosphorylated to be actively
transcribing.
RNA polymerase III
Transcribes tRNA, 5S rRNA, and other small stable RNAs including one involved in
RNA splicing.

Amanitin ???







Regulation of Eukaryotic Gene Expression



Promoter Elements

-TATA box: part of promoter region. Genes that are transcribed at high levels (have strong
promoters) have a TATA box starting 35 bp upstream of the start site.
functions like an E.coli promoter by placing RNA polymerase for transcription initiation. It
is NOT found in prokaryotes. The directionality of the sequence is necessary for normal
transcription (see RNA polymerases lecture for how the TATA box regulates transcription)


-Initiator element: promoter that includes a C at the -1 position and an A at the +1

position. No good consensus sequence has been defined.



-CpG Island: the p refers to the phosphodiester bond, not base pairing C-G. They contain
a CG-rich stretch of 20-50 bp within ~100 bp of the start-site region. They are relatively
uncommon. They are important for lowly transcribed genes such as for metabolism. They
can regulate transcription in both directions (known as divergent transcription) although
one direction is favored. It is thought that general transcription factors bind to them and
thus allows them to regulate transcription.




Enhancer Elements

Promoter-proximal elements: sequences that are around 100-200 bp upstream from the
promoter that control transcription and are around 6-10 bp long (textbook says they can
also be found downstream). They can be cell type specific (i.e. that a promoter proximal
element in one cell can have a much larger impact on transcription than the same element
in a different type of cell). Transcription factors bind to the promoter-proximal.

Linking-scanner mutation: Insert a sequence that is thought to have promoter-proximal
elements (the control region) into a vector containing a minimal promoter (e.g. a TATA box)
and a reporter gene. Then systematically replaces overlapping chunks of the control region
using restriction enzymes. By analyzing the variation on the transcription of the mRNA of the
reporter gene, one can assess which parts of the control region are promoter-proximal
elements.

Deletion analysis: similar to linking-scanner mutation but instead of systematically
replacing chunks of nucleotides within a sequence, you are shortening the sequence.
Therefore, linking-scanner mutation is more specific since it will tell you exactly what
sequences are necessary for transcription.


5' deletion analysis is a method where upstream base pairs are systematically cut. for
example -20 to +1 then the second would be -15 to +1 third -10 to +1 and so on. these
segments are then bound with a reporter gene and transfected into E. coli cells. The E.coli
cells are used to translate and transcribe the reporter genes. prepare the cell extracts and
measure the reporter gene expression.


Enhancers: can be tens of kilo bp upstream or downstream of the gene. The directionality
of the enhancer sequence is not important for regulating transcription. Enhancers can be
excised and inserted elsewhere and still regulate transcription. Although enhancers may be
far from a promoter region in terms of the sequence, it may be physically close to the
promoter region since DNA is bound up into chromatin (interestingly, they are common in
eukaryotes but fairly rare in prokaryotes). The distinction between promoter-proximal
elements and enhancers is not very clear. It should be thought of as a spectrum.

Upstream Activating Sequence (UAS): found in yeast genes. Similar to an enhancer. It
binds the transcriptional factor GAL4. Note that the TATA box in yeast is 90 bp upstream
from the start site. QUESTION: promoters, promoter-proximal elements and enhancers do
not have specific sequences depending on the gene that they regulate? (note that the same
enhancer wont work in any kind of cell but that is because of the proteins present) It is the
different types of transcription factors that bind to them that allow for specificity of gene
transcription? Will a single gene be regulated by several promoters? No eh Several
enhancers?



Techniques used to purify proteins:

-Ion-exchange chromatography (based on charge of protein)
-Gel filtration chromatography (based on size of protein)
-Affinity chromatography (based on specific interaction e.g. antibody-antigen)


Techniques used to identify regulatory proteins:

DNAse I footprinting: if a protein is bound to a certain sequence of DNA, that sequence
cannot be digested by nucleases. By using a nuclease that cuts randomly (but cuts in only one
location per DNA molecule, this is ensured by the concentration of DNAse I), several fractions
of various sizes will be produced. These fractions can then be separated according to their size
by gel electrophoresis. Since, the sequence that is bounded by a protein will not be cut, there
will be a band of a particular size that will be missing. This will appear as a footprint on the
gel. The missing bands indicate the sequence of DNA where the protein was bound.


Eletrophoretic mobility shift assay (EMSA) or gel shift: a segment of DNA that has a
protein bounded to it will migrate slower in a gel than DNA alone. Although it is better for a
quantitative analysis DNA-binding protein, it does not provide the sequence of the binding
site.

Typical experiment for purifying a transcription factor

-Map the binding site using DNAse I footprintig. This allows you to find the sequence to which
the binding protein will bind but you still wont know what the protein is.
-Synthesize a DNA sequence containing multiple copies of the binding site and couple it with
beads.
-Incubate the nuclear extract with the bead. This will serve for your column for affinity
chromatography. Separate proteins-DNA by affinity chromatography. This will provide you
with several fractions some containing the binding protein some without the desired protein.
Use EMSA to detect which fractions have the DNA binding protein.
-Verify whether the protein can stimulate transcription by first doing an in vitro assay
followed by a co-transfection assay (in this case a plasmid containing the gene for the
transcription factor and a plasmid containing a reporter gene which is regulated by an
enhancer that binds with the transcription factor).

Transcription Factors

Transcription factors: They are proteins that stimulate or repress transcription, bind to
promoter- proximal elements and enhancers in eukaryotic DNA. They contain a single DNA

binding domain and one or more activation domains (for activators) or repression
domains (for repressors). Activation and repression domains will interact with other
proteins.

-By deleting segments coding for a transcription factor (e.g. GAL4), one can create
truncated proteins and determine both binding sites (i.e. can the truncated protein bind to
the enhancer) & activating sites (i.e. will transcription occur).

-Internal deletion mutants: sequence which codes for amino acids in between the binding
site and the activation site (i.e. an intervening sequence). The intervening sequence is not
necessary for binding and activation under laboratory conditions, but is thought to be
useful for the 3D structure, which allows activation domain to come into contact with
chromatin remodeling complexes (which slide the nucleosomes around on the DNA which
is necessary for the initiation of transcription)

-Domain-swapping experiment: a DNA-binding domain from a transcription factor can be
fused with the activation domain of a second protein. This results with a functional
transcription factor, which can regulate the expression of a gene.

-Transcription repressors: very few are known (because hard to experiment) and are
thought to be rare. Their repression domain interacts with chromatin remodeling
complexes, which pack DNA up and prevent other transcription factors to initiate
transcription.


DNA binding domains

-DNA-binding domains have structural sequence motifs that bind to specific DNA
sequences (therefore, transcription factors are specific to enhancer, but how is that
possible if enhancers arent specific??)


-Proteins that bind to DNA usually do so by forming non covalent bonds between atoms in
an helix in the DNA-binding domain (protein) and atoms on the edges of the bases within
a major groove of the DNA. MORE?

Classes of DNA-binding proteins:

-Zinc Finger Proteins:
It folds around a central Zn2+ ion, producing a compact domain for a relatively short length
of the polypeptide chain.

C2H2 zinc finger: Most common DNA-binding motif in humans. Binding of zinc ion by
two cysteine (C) and two histidine (H) residues compacts the domain, allows insertion
of the -helix into the major groove of the DNA. Usually contain 3 or more repeating

finger units and bind as monomers (i.e. a transcription factor will contain multiple
C2H2 zinc fingers)

C4 zinc finger: Composed of four cysteines. Much less common, they are found in ~50
human transcription factors of the nuclear receptor family. These proteins generally
contain only two such units but bind as homodimers. These have two-fold rotational
symmetry (bind in mirror image) and bind to consensus DNA sequences that are
inverted repeats.



purple: alpha helix, green beta sheets, black zinc, not all fingers bind
(a) 1 large protein (b) 2 identical proteins




-Leucine-Zipper Proteins
Consensus has a leucine residue at every seventh position. Leucine is hydrophobic;
they tend to interact with one another.
Bind DNA as dimers either homodimers or heterodimers but often as heterodimers.
Related proteins have a different repeated hydrophobic amino acid (i.e. not leucine but
another hydrophobic amino acid); basic zipper (bZip) is the term for the larger family
of proteins.
-Look like scissors.


- Basic Helix-Loop-Helix (bHLH)
Similar to basic zipper except a non-helical loop separates two a-helical regions.

Different bHLH proteins can form heterodimers (they can still form homodimers)
-Dimerization domain (binding of both proteins) requires hydrophobic repeat

Regulatory Diversity is ensured by:

-Heterodimers: for example 3 different transcription factors (monomers) can make 6


dimers (3+2+1), 4 can make 10 (4+3+2+1).

-Inhibatory factors: interacts with a specific monomer (a bZip or HLH) and blocks DNA
binding. Blocking the expression of a gene must be considered also as increasing
regulatory diversity.

-Cooperative binding: genes often require more than one transcription factor to be
regulated. Two transcription factors will bind to nearby sites. Alone, the interaction
will be too weak for transcription. However, if both transcription factors bind,
robust transcription will occur. Both transcription factors can actually interact with
each other.



Activation Domains

Less sequence consensus than for DNA-binding domains. They are much more
heterogeneous than binding-domain (since they interact with a variety of proteins).

Many activation domains have a high percentage of one or two particular amino acids
(Asp, Glu, Gln, Pro, Ser, Thr).

Acidic activation domains (those with Asp or Glu) are active when bound to a protein co-
activator. They are the one that are best understood.

Example 1: CREB must be phosphorylated to bind its co- activator CBP, which
changes its conformation and makes it an active transcription factor.


Example 2: active domain of the estrogen receptor has to be bound to estrogen to be
in an active conformation. The estrogen receptor is an intrinsically unstructured
protein (so x ray crystallography cannot be performed on it) but assume a specific
shape when bound with its ligand (estrogen). This conformational change allows it
to bind to a co-activator which starts a cascade of events that will lead in
transcription.
Tamoxifen emulates estrogen but does not allow the changes in conformation that
will lead to transcription. Therefore, it is an antagonist.





Co-activator: A co-activator is a protein that increases gene expression by binding to
an activator (or simply a transcription factor) which contains a DNA binding domain. The
co-activator is unable to bind DNA by itself.


Enhanceosome : a protein complex that binds to the enhancer region of a gene.

-Interferon: in a cell infected with a virus, a cell emits interferon (a protein) to signal to a
neighboring cell, which will turn on -interferon in response. The gene coding for -
interferon is regulated by the -interferon enhancer that is bound with a protein complex
and thus forms an enhanceosome.



Mediator complex: large protein complex (bigger than ribosome) that forms a molecular
bridge between activation domains of a transcription factor and RNA Pol II. There is a head
region, a middle region and a tail region. Conserved in shape but not necessarily in
sequence.


Co-activator of transcription (therefore, it does not bind with DNA itself!)
Activation domain-mediator interactions stimulate assemble of pre-initiation complex at
a promoter
Head and middle domains interact directly with RNA pol II subunits
Other mediator domains interact with activation domains of activators




Post-Transcriptional Mechanisms

Pre-mRNA processing: it is necessary to protect the pre-mRNA and to make it ready for
translation (occurs in the nucleus). Three major events happen. Since these events happen
simultaneously with transcription they are said to be co-transcriptional:

-5 capping
7-methylguanosine is added to the 5 end of the nascent mRNA when it is 25- 30 nt long.
This protects the RNA from digestion (5-exoribonuclease).
This is catalyzed by a dimeric capping enzyme, that associates with the CTD of RNA Pol
II. One subunit removes the - phosphate from the 5 end of the RNA, and the other
transfers GMP from GTP to the 5 diphosphate of the nascent transcript.
Separate enzymes then transfer methyl groups to the N7 position of the guanine.
This is the first thing to happen to nascent RNA. Polymerase essentially waits for the
nascent RNA to be capped before elongating at a rapid rate.



-RNA splicing

For short mRNA sequences, splicing will occur after 3 cleavage and polyadenalation. For
longer sequences, splicing will occur at the same time as transcription. The location of
splice sites (i.e. intron-exon junctions) in a pre-mRNA can be determined by comparing the
sequences of cDNA (prepared from the corresponding pre-mRNA) and genomic DNA, The
sequences present in genomic DNA but absent in the cDNA represent introns and indicate
splicing sites. There are moderately conserved short consensus sequences at the splice
sites flanking introns in eukaryotic pre-mRNA (GU at 5 and AG at 3). A pyrimidine (C & U)
rich region just upstream of the 3 splice site is common. There is also an A branch point
that is highly conserved.





Introns are removed as a lariat structure in which the 5 G of the intron joined in an
unusual 2,5-pdosphodiester bond to an adenosine near the 3 end of the intron. This A
residue called the branch point A because it forms as RNA branch in the lariat structure. In
each transesterification reaction, one phosphoester bond is exchanged for another. Since
the number of phosphoester bonds is not changed in either reaction no energy is
consumed. The net result of these two reactions is that the two exons are ligated and the
intervening intron is released as a branched lariat structure.

Small nuclear RNAs (snRNAs): small segment of RNA that can base pair with pre-mRNA
and proteins. Important ones are: U1, U2, U4, U5 and U6. They are rich in uracil. Each
snRNA is associated to 6-10 proteins. U1 snRNA recognizes the 5 splice site of the pre-
mRNA (the sequence includes part of the intron and exon). Mutations of these sequences
prevent RNA splicing. U2 recognizes the branch point but does not actually base pair with
the branched A which is necessary for the first transesterification reaction. Again,
mutations of these sequences in the pre-mRNA prevent splicing.

Small nuclear ribonucleic particle (snRNPs): composed of snRNAs and proteins. U1
U6 have their corresponding snRNP (i.e. that U1 snRNA with other proteins form U1
snRNP). snRNPs bind with other proteins to form spliceosomes.


Spliceosome: formed by snRNP and other proteins. Binds with pre-mRNA and performs
splicing.

SR proteins: proteins rich in Serine (S) and Arginine (R). They interact with sequences
within exons called exonic splicing enhancers. They are a subset of hnRNP proteins and
contain one or more RRM RNA-binding domains. When bound to exonic splicing
enhancers, SR proteins mediate the cooperative binding of U1 snRNP to a true 5 splice site

and U2 snRNP to a branch point through a network of protein-protein interactions that


span an exon.

Self-splicing introns exist (i.e. the DNA can splice in the absence of proteins). Two types of
self-splicing introns are known (intron I and intron II). They can be found in mitochondria
and chloroplasts. It is thought that snRNAs that are involved in splicing evolved from self-
splicing introns (such as group II)





-3

cleavage/polyadenalation


Poly(A) polymerase (aka PAP) binds to the
pre-mRNA and stimulates cleavage at the
poly A site. PAP adds around 12 A residues
slowly, but it takes the binding of poly(A)-
binding protein II (PABPII) to the initial
short poly(A) tail in order to accelerate the
rate of the addition by PAP. The final tail is
around 200-250 residues in length. PABPII
signals to PAP to stop polymerization.
Furthermore, binding of PABPII to the
poly(A) tail is essential for mRNA export
into the cytoplasm.



All mRNA has a poly A tail except for
histone mRNA



N.B. During transcription the RNA Pol II CTD tail is huge and trails behind it recruiting
other proteins necessary for pre-mRNA processing (e.g. proteins part of the spliceosome).
Recall, that the CTD tail must be phosphorylated for RNA Pol II to be able to transcribe.
Essentially, RNA is never alone. It is always in a nucleo-protein complex.













hnRNP (Heterogeneous nuclear ribonucleoproteins particles) : they are complexes
of RNA and protein present in the cell nucleus during gene transcription and subsequent
post-transcriptional modification of the newly synthesized RNA (pre-mRNA). The bound
protein prevents RNA from forming secondary structures (base pairing with itself).





RNA binding domains: Many were discovered in hnRNPs. Here are a few.

-RNA recognition motif (RRM): the most common RNA binding domain. 80 amino acids,
folds into 4- stranded b sheet flanked by 2 a-helices. Contains two highly conserved
sequences, RNP1 and RNP2, that contact the phosphates of RNA (RNP1 and RNP2 are found
across organisms ranging from yeast to humans).

-RGG box: contains 5 Arg-Gly-Gly repeats interspersed with aromatic amino acids (Phe,
Tyr, Trp). Structure unknown.

- KH motif: 45 residues, similar structure to RRM domain, but RNA binds by interacting
with a hydrophobic surface formed by two alpha helices and one beta strand.



Complementary DNA (cDNA): DNA synthesized from a mRNA template in a reaction
catalyzed by the enzymes reverse transcriptase and DNA polymerase.



Alternative Splicing: allows for protein diversity. In Drosophila allows for sex
determination.











Control of Sex-lethal expression


Sxl is under transcriptional control; it is expressed only in females in early
embryogenesis.
Later in development, the female-specific promoter is repressed and a different Sxl
promoter is activated that is on in both sexes.
However...Sxl pre-mRNA is alternatively spliced depending upon the presence of Sxl
protein.

Sxl binds to a sequence near the 3 end of the intron between exons 2 and 3 and blocks
the association between U2AF and the U2 snRNP. (Thus Sxl represses a particular
splice site.)
U1 snRNP binds properly to the 3 end of exon 2, but assembles into a spliceosome with
U2 snRNP bound to the branch point at the 3 end of the intron between exons 3 and
4. Thus exon 2 gets spliced to exon 4 and exon 3 goes out as part of a larger intron.









The sex-determination cascade

Sxl regulates tra pre-mRNA splicing by the same mechanism.
Tra regulates the splicing of dsx pre-mRNA. Only females have Tra. It forms a complex
with Rbp1 and Tra2, directs splicing of exon 3 to exon 4, and promotes cleavage and
polyadenylation at an alternative poly(A) site at the 3 end of exon 4. Males have
no Tra. Exon 4 is skipped, exon 3 is spliced to exon 5, polyadenylation occurs
downstream of exon 6. Different forms of Dsx!!, a transcriptional repressor, are
produced in male and female embryos.

Tra/Rbp1/Tra2 activate a
particular splice site by
binding to exon 4 and
recruiting U2AF and U2
snRNP to the 3 end of the
intron between exons 3 and
4.

So splicing regulators can work as activators


as well as repressors.

The stability of cytoplasmic mRNAs varies widely within and between organisms


Splicing activators and repressors:
Splicing activators and splicing repressors can regulate splicing. (e.g. Sxl protein acts as a
splicing repressor whereas Tra protein acts as a splicing activator). Whether an exon will
be skipped or not (indirectly determining what isoform will be made) results from the
combined influence of several splicing activators and repressors. RNA binding sites for
repressors, usually hnRNP proteins, can also occur in exons, where they are called exonic
splicing silencers. Binding sites for splicing activators, usually SR proteins, can also occur in
introns, where they are called intronic splicing enhancers.

N.B.

Eukaryotic mRNAs can be destabilized by a sequence motif.

-Many short-lived mRNAs in eukaryotes contain multiple copies of the sequence AUUUA in
their 3 UTR.

-Adding such sequence motifs to the 3 UTR of a gene that usually does not contain them
dramatically destabilizes the hybrid mRNA.





Nucleo-cytoplasmic Transport


The nucleus is separated from the cytoplasm by two membranes, which form the nuclear
envelope. Macromolecules move through the nuclear envelope through nuclear pore. Unlike
the cells organelles, proteins being imported into the nucleus are already folded. Note that
all proteins found in the nucleus are synthesized in the cytoplasm.
Each nuclear pore is formed from an elaborate structure called the nuclear pore complex
(NPC) that is larger than a ribosome. A NPC is composed of multiple copies of up to 30
different proteins called nucleoporins. These can be classed into three main types:
structural nucleoporins (Y-complex), membrane nucleoporins and FG-nucleoporins. The
FG-nucleoporins contain multiple repeats of short hydrophobic sequences which are
thought to form a matrix which allow the passage of small molecules but excludes
unchaperoned hydrophilic proteins larger than 40 kDa. Larger proteins and
ribonucleoproteins need to be actively transported. Filaments can be found on both the
The
nuclear poreacomplex
is a highly ordered
structure
cytoplasmic
nd nucleoplasmic
side.
The later forms the nuclear basket.



Nuclear-localization signal (NLS): are sequences found on proteins that are imported
into the nucleus. Without a NLS, a protein cannot enter the nucleus. There are a variety of
different NLS, although different proteins may share the same NLS.


Ran: a monomeric G protein that exists in two conformations, one when bound to GTP and
a different one when bound to GDP. It is a type of GTPase.


Importin and : a heterodimeric nuclear-import receptor. Importin binds to NLS,
importin binds to FG-nucleoporins.



Cargo protein: protein that is being imported or exported.

Guanine nucleotide exchange factors (GEFs) activate monomeric GTPases by stimulating
the release of guanosine diphosphate (GDP) to allow binding of guanosine
triphosphate (GTP).

Nuclear import


The nuclear transport receptor binds to both the NLS on a cargo protein to be transported
into the nucleus and to FG-repeats on nucleoporins. Nuclear transport receptors can be
either monomeric or dimeric but in any case, they have binding domains for both NLS and
FG repeats. Free nuclear transport receptor (importin) binds to its corresponding NLS of a
cargo protein, forming an importin-cargo complex. The cargo complex then translocates
through the NPC channel as the nuclear transport interacts with FG-repeats. The cargo
complex rapidly reaches the nucleoplasm, and there the nuclear transport receptor
interacts with Ran-GTP (which was Ran-GDP, but became Ran-GTP due to GEF), causing a
conformational change in the nuclear transport receptor that releases the cargo protein
with its NLS into the nucleoplasm. The nuclear transport receptor-Ran-GTP complex then
diffuses back through the NPC. Having reached the cytoplasm, Ran interacts with GAP
(GTPase accelerating protein) which is actually a component of the NPC cytoplasmic
filaments. GTPase hydrolyzes Rans GTP, making it into GDP. This allows for the release of
the nuclear transport protein where it can participate into a second cycle. Ran-GDP travels
back through the pore where it also can participate in a second cycle. Asymmetrical
concentrations of the molecules at play, ensure the nuclear transport protein-cargo
complex diffuses in a unidirectional fashion.

Nuclear Export

Proteins being exported out of the nucleoplasm contain a nuclear-export signal (NES) in
addition to an NLS. A nuclear transport receptor (in this case exportin 1), forms a complex
with a Ran-GTP which induces a conformational change and allows it to bind to the NES of
a cargo protein (thus forming a trimolecular cargo complex). Exportin 1 interacts with FG-
repeats and allows the cargo complex to diffuse through the NPC. Having arrived in the
cytoplasm, GAP hydrolyzes the GTP found on the Ran protein causing the cargo complex to
dissociate and for the cargo protein to float freely in the cytosol. Exportin 1 and the Ran-
GDP are then transported back into the nucleus through a NPC.


Some
RNAs
are

exported through associating with Ran


-Exportin t functions to export tRNAs. It binds fully processed tRNAs and Ran-GTP and
passes through NPCs. The complex dissociates in the cytosol when it interacts with Ran-
GAP.

-Export of ribosomal subunits requires Ran.

-Some specific mRNAs that associate with particular hnRNP proteins (HIV Rev for example)
can be exported through association with Ran.

-Most mRNAs are exported in a Ran-independent process using an mRNA exporter.






Exportation of mRNA

- The mRNP exporter consists of two subunits (NXF1 and NXT1).



Multiple NXF1/NXT1 dimers bind to nuclear mRNPs through cooperative interactions
with the mRNA and specific mRNP proteins


They form a domain that interacts with FG repeats in FG- nucleoporins (thus they act
similarly to nuclear transport proteins but bind to mRNP instead of a NLS).

-Transport of the mRNP through the NPC is Ran-independent. Its equivalent would be
Dbp5 which acts as an ATP-driven motor to remove NXF1/NXT1 from the mRNP
complexes as they emerge into the cytoplasmic side. NXF1 and NXT1 are then re-imported
into the cell though the NPC by a Ran dependent mechanism.


Other proteins assist in mRNA export

SR proteins Stimulate binding of the mRNA exporter to processed mRNAs. (note that
this means that SR proteins are not only involved in splicing but also in export).

Exon-junction complex proteins Bind to regions of mRNA about 20 bases 5 to each
exon- exon junction. At least one such protein binds the mRNA exporter. One of these is
REF (RNA export factor). Why is keeping track of exon sites important?

Nuclear cap-binding complex (CBC) mRNAs are exported such that the 5 end goes out
first. Nuclear cap binding proteins are believed to function in targeting mRNPs to the NPC.

Poly(A) binding protein(s) The poly(A) tail is required for export.
After export, nuclear mRNPs proteins dissociate from the mRNA and cytoplasmic factors
involved in translation are recruited.




mRNP remodeling

mRNP remodeling is a process in which the proteins associated with an mRNA in the
nuclear mRNP complex are exchanged for a different set of proteins as the mRNP is
transported throught the NPC. Thus, proteins such as NXT1, NXF1, PABPII, CBC and REF
are left behind in the nucleus. On the other hand, proteins such as PABPI (which replaces
PABPII) and eIF4E, a translation initiation factor (which replaces CBC), are added.






-Pre-mRNAs are excluded from the export system (otherwise this would cause the
translation of defective proteins).

-Only fully spliced mature mRNAs get exported; mechanisms of this restriction are not fully
understood.

-This phenomenon may be associated with the activity of a protein bound to a nucleoporin
that actively blocks pre-mRNAs associated with snRNPs from leaving the nucleus

Translation


Three types of functional RNAs are involved in protein synthesis




Messenger RNA (mRNA): carries the genetic information from DNA in the form of
codons.


Transfer RNA (tRNA): deciphers the codons in mRNA. Each type of amino acid has
one or more tRNAs which bind it and which can carry it into a growing polypeptide
chain. tRNA has a three-nucleotide anticodon that can base pair with a codon in
mRNA.


Ribosomal RNA (rRNA): associates with proteins to form ribosomes. One of the
rRNAs catalyzes the formation of a peptide bond between the N of the amino group
of the incoming amino acid and the carboxy-terminal C on the growing polypeptide
chain.


The ribosome reads from 5 to 3 of the mRNA. Thus, the 5 end of the mRNA gets translated
first. Recall, that the 5 end of the mRNA is first to go through the NPC. An amino acid can be
coded by several different codons and consequently, the code is termed degenerate.
However, a codon will code for only one amino acid.













The Genetic Code




There are 64 possible codons (4x4x4, since codons are triplets and there 4 different
nucleotides). 61 specify a.a. and 3 are stop codons (the start codon codes for an a.a.,
methionine.

The code is:
Comma-less (no interruptions between codons)
Overlapping (it can be read in three possible frames)
Degenerate (more than one codon can encode the same amino acid).


The AUG codon for methionine is the usual start codon.


Three codons (UAA, UAG, and UGA) function as stop codons.




There are three different possible reading frames for every mRNA. Therefore, in theory, for
every mRNA molecule, there are three different possible polypeptides that could be
synthesized. However, the vast majority of mRNAs can be read in only one frame because
stop codons encountered in the other two possible reading frames terminate translation
before a functional protein is produced.
Frame-shifting can occur (e.g. four nucleotides will code one a.a. or back up one nucleotide
and then proceed by reading three nucleotides at a time)

There are 30-40 different tRNAs in bacteria and as many as 50-100 in animal and plant
cells. Therefore, there are more types of tRNA than there are a.a. used in protein synthesis
(20) and then there are codons used in a.a. translation (61). This implies that a.a. have
more than one tRNA to which they can attach AND that tRNAs can pair with more than one
codon (wobble position). Recall however, that a codon will always code for the same a.a. .

tRNA molecules fold into a stem-loop arrangement composed of four stems and three loops
that resembles a cloverleaf when drawn in 2D, but an L shape in 3D. The unlooped stem
(the acceptor stem) contains the free 3 and 5 end of the chain. The 3 has the sequence
CCA which is added after synthesis. In addition, several bases are modified creating non-
standard nulceotides (e.g. inosine).



The fact that a tRNA anticodon can recognize many different, but not necessarily every,
codons corresponding to an a.a., can be explained by the wobble position; that is, the 3rd 3
base in the codon and the corresponding first 5 nucleotide in the anticodon.





Aminoacyl-t-RNA synthetase: enzyme which ensures that the each tRNA receives the
proper amino acid. There are 20 different synthetases that recognize only 1 amino acid!
They will covalently bond the amino acid to its cognate tRNA (there is more than one since
there are usually more than one anticodon for an amino acid).



Ribosomes consist of a large and small subunit, each containing one major rRNA molecule
and numerous different proteins. The large subunit contains one (in prokaryotes) or two
(in eukaryotes) additional small rRNAs.


A reading frame, the uninterrupted sequence of codons in mRNA from a specified start codon to a stop codon, is translated into
the linear sequence of amino acids in a polypeptide chain.


Decoding the nucleotide sequence in mRNA into the amino acid sequence of proteins (translation) requires tRNAs and
aminoacyl-tRNA synthetases.


All tRNAs have a similar3-D structure that includes an acceptor arm for attachment of a specific amino acid and a stem-loop
with a 3-base anticodon at the opposite end.


Because of non-standard base-pairing (other than A- U and G-C), a tRNA may base-pair with more than one mRNA codon.
Similarly, a particular codon may base-pair with more than one tRNA that carry the same amino acid.


Each aminoacyl-tRNAsynthetase recognizes a single amino acid and covalently links it to the appropriate tRNA with a high
energy bond.

Ribosomes consist of a large and small subunit, each containing one major rRNA molecule and numerous different proteins. The
large subunit contains one (in prokaryotes) or two (in eukaryotes) additional small rRNAs.

Eukaryotic translation
Initiation Assembly of a ribosome complex with an mRNA
and an initiator tRNA charged with methionine.

Elongation
Stepwise addition of amino acids to the polypeptide chain.

Termination
Release of the completed polypeptide and of the ribosome, disassembly of the ribosome.





The AUG codon which initiates transcription, codes for methionine (Met). Both
prokaryotes and eukaryotes contain two different methionine tRNAs: tRNAi^Met can
initiate protein synthesis, and tRNA^Met can incorporate methionine only into a growing
chain. The same amynoacyl-tRNA synthesase charges both types of tRNA, but only Met-
tRNAi^Met (i.e. activated methionine attached to tRNAi^Met) can bind to the P site (yes,
the first a.a. to be added is unusual as it binds directly to the P site, skipping the A site). This
happens before the small subunit binds to the mRNA.

Translation Initiation


Eukaryotic initiation factors (eIFs) from 1-5

Step 1.

eIF2-GTP binds to the first Met-tRNA that will be added. This forms a ternary complex.

N.B. Protein synthesis can therefore be negatively regulated by the phosphorylation of eIF2
(i.e. more phosphorylated eIF2 causes less translation).

Step 2.

40S subunits (small subunit) float in the cell with eIF3 and eIF1 bound at the E site. eIF1A
is bound at the A site. This leaves the only the P site open.

The ternary complex with eIF5 will bind to the P site of the 40S subunit. This creates the
43S pre-initiation complex.


Step 3.

Remember that the first Met binds with the small subunit before the subunit is even
attached to mRNA. Meanwhile, mRNA binds with the eIF4 complex.
-eIF4G binds with the PABP I of the poly A tail
-eIF4A is as a helicase and unwinds the secondary structure of RNA at the 5 end
-eIF4B joins and stimulates helicase
-eIF4E binds at the 5 cap
This makes mRNA into a loop structure.

Step 4.

The mRNA-eIF4 complex binds with the 43S pre-initiation complex creating a 48S
initiation complex.

Step 5.

Before the large ribosomal subunit can come in, the small subunit must find the initiation
site AUG. While eIF4A unwinds RNA, the 48S initiation complex scans the RNA from 5 3
until it finds AUG.





Step 6.

Having found the initiation site, eIF5 stimulates the hydrolysis of GTP to GDP (on the eIF2).
This causes a conformational change that actually allows the Met to base pair with RNA at
the P site.

Step 7.

The large ribosomal subunit (60S) can finally join the party. It causes the release of eIF1,
eIF2, eIF4 (which will go on and bind to another mRNA) and eIF5. On the other hand, eIF1A
remains at the A site with the newly arrived eIF5B-GTP.

Steps 8.

The correct association of the 40S and 60S subunits results in the hydrolysis of eIF5B-GTP
and eIF1A freeing the A site and forming the 80S initiation complex with the tRNA^Met
base-paired to the initial codon.

Cap-independent translation

Some cellular mRNAs contain an internal ribosome entry site (IRES) distant from the 5
end. Translation of these mRNAs is eIF4E-independent. The IRES may fold into a structure
that can interact directly with the ribosome.

Many viral mRNAs lack a cap structure and translation is initiated at IRES elements.

Translation Elongation

Elongation factor (EF)

Step 1.

A charged tRNA molecule associated with EF1-GTP arrives at the A site.

Step 2.

If the codon matches the anticodon. the GTP of the EF1 is hydrolyzed to GDP (hence it is a
proof reading step). This causes a conformational change in the ribosome which allows the
binding of the aminoacyl-tRNA at the A site, the release of EF1 and also favors the
formation of a peptide bond (peptidyltransferase reaction) between the second amino acid
and Met. This reaction is actually catalyzed by the large ribosomal subunit (hence catalytic
rRNA).

Step 3.

Following the peptidyltransferase, the ribosome translocates along the mRNA for a
distance of one codon which is motored by the hydrolysis of a GTP molecule found on EF2.
Now the tRNA from the initial Met is found at the E site (and will later exit) and the tRNA
from the newest amino acid is found at the A site.

These steps are repeated and the polypeptide chain grows (always adding an a.a. to the C-
terminal).


Translation Termination

Release factors (RF)

eRF1 (which has a similar shape to tRNA) recognizes a stop site (e.g. UAA) and binds at the
A site. eRF3-GTP with eRF1 promote the cleavage of the peptidyl-tRNA, thus releasing the
completed chain. This results in the desired polypeptide chain and a post-termination
complex consisting of a free tRNA at the P site, the mRNA, the 80S ribosome, eRF1 and
eRF3-GDP. A protein called ABCE1 uses energy ATP to dissociate the post-termination
complex allowing for the parts to be re-used (the ribosome actually dissociates into large
and small subunits). In reality, mRNA is never free. Rather the mRNA has other ribosomes
associated with it in various stages of elongation, PAPBI bound to the polyA tail, and eIF4
complex associated with the 5-cap (making it circular and increasing translation
efficiency), ready to associate with another 43S pre-initiation complex.

Upon disassembly of the ribosome at termination, the large subunit associates with
eukaryotic initiation factor 6 (eIF6) and the small subunit associates with eIF3.

Increasing rates of translation

The elongation rate is relatively constant, and a typical protein molecule takes 30-60
sec to synthesize.

Simultaneous translation from the same mRNA by multiple ribosomes (polysomes)
increases the rate of protein synthesis.

Efficient recycling of ribosomal subunits and re- initiation also increases the rate of
protein synthesis.


Cytoplasmic mechanisms of post-transcriptional control of gene


expression

-Translation regulation
-mRNA degradation
-mRNA localization

Translation Regulation

Cytoplasmic polyadenalation

The egg cells (oocytes) of multicellular animals contain many mRNAs, encoding numerous
different proteins that are not translated until after the egg is fertilized by a sperm cell.
Some of these stored mRNA have a short poly(A) tail. The shorter tail can be explained by
two 3 UTR which are required for the polyadenalation in the cytoplasm: the poly(A) signal
and one or more cytoplasmic polyadenalation elements (CPE) which are U-rich. CPE is
bound by a highly conserved CPE-binding protein (CPEB) that contains an RRM domain
and a zinc-finger domain.

When CPEB is not phosphorylated, it binds Maskin, which in turn binds eIF4E and blocks it
from associating with eIF4G. This prevents efficient translation since eIF4 cannot interact
with other initiation factors (e.g. PABP I) and the 40S ribosomal subunit.
.


When CPEB is phosphorylated, it releases Maskin, allowing cytoplasmic forms of CPSF
(cleavage and polyadenalation specificity factors) and PAP to bind, leading to the
lengthening of the poly(A) tail. Now, PABPI can bind to the poly(A) tail and interact with
eIF4G to initiate translation.


Iron-dependent regulation of mRNA translation

Mechanisms have evolved for controlling the translation of certain specific mRNAs. This is
usually done by sequence-specific RNA-binding proteins that bind to a particular sequence
or RNA structure in the mRNA. When binding is in the 5 UTR of an mRNA, the ribosome
cannot scan for the first initiation codon and thus inhibits translation initiation.

Ferritin is a protein that binds iron. When too much iron is present, it is desirable to
produce more ferritin. The 5 UTR of ferritin mRNA contains iron-response elements
(IREs) that have a stem-loop structure. The IRE-binding protein (IRE-BP) recognizes five
specific bases in the IRE loop and the duplex nature of the stem.

When iron concentrations are low, IRE-BP is in an active conformation that binds to the
IREs. The bound IRE-BP blocks the small ribosomal subunit from scanning for the AUG
start codon, thereby inhibiting translation initiation. Less ferritin is produced and iron
concentrations rise.

When iron concentrations are high, IRE-BP is in an inactive conformation that does not
bind to the 5 IREs and translation preceeds normally. Ferritin is produced and binds iron,
thus decreasing levels of free iron.


Transferrin receptors (TfR) ensures that iron is imported into the cell. The 3 UTR of TfR
mRNA contains IREs whose stems have AU-rich destabilizing sequences.

When iron concentrations are high, IRE-BP is in an inactive conformation and the AU-rich
sequences promote the degradation of the mRNA.

When iron concentrations are low, IRE-BP is in its active conformation and binds to the 3
IREs in TfR mRNA preventing degradation.











Regulatory RNA

Short single stranded RNAs called micro RNAs (miRNAs) and short interfering (siRNAs).
Both base-pair with specific target mRNAs, either inhibiting their translation (miRNAs) or
causing their degradation (siRNAs). Many miRNAs can target more than one mRNA and
hence regulate gene expression. siRNAs are involved in a mechanism called RNA
interference

Small regulatory RNAs (~21 nts) that base pair with 3 UTR sequences of specific mRNAs.

miRNAs encoded by genes and are formed by processing of a 70-nt precursor RNA that
forms a hairpin with a few base mismatches in the stem. A ribonuclease called Dicer
produces mature miRNAs from these precursors.
siRNAs produced from dsRNA through Dicer-mediated cleavage similar to miRNA
processing
miRNAs do not base pair precisely with their target RNAs and represses their translation.
siRNAs do base pair precisely with their target RNAs and induces their
cleavage/degradation.

Note

that the double-stranded mi/siRNA has a two base overhang at each 3 end.




mRNA degradation


The concentration of an mRNA is a function of both its rate of synthesis and its rate of
degradation. In general, eukaryotic mRNA has longer half-lives than those of prokaryotes.
There exist three different pathways by which mRNA can be degraded:

1-decapping pathway (deadenylation-independent)

Some mRNAs do not need the poly(A) tail to be removed for degradation. This is because
certain sequences at the 5 end of an mRNA seem to make the cap sensitive to the
decapping enzyme. For these mRNAs, the rate at which they are decapped controls the rate
at which they are degraded because once the 5 cap is removed, the RNA is rapidly
hydrolyzed by the 5 3 exonclease.


2-deadenylation-dependent pathway

Most mRNAs are degraded by the deadenylation-dependent pathway. The poly(A) tail is
gradually degraded by deadenylating nucleases. When it is shortened sufficiently, PABPI
can no longer bind and stabilize the interaction of the 5 cap and translation initiation
factors (eIFs). The exposed cap is removed by a de-capping enzyme and the mRNA is
degraded by a 5 3 exonuclease. Removal of the poly(A) tail also makes mRNAs
susceptible to degradation by cytoplasmic exosomes containing 3 5 exonucleases.

3-endonucleolytic pathway

Some RNA requires neither decapping nor deadenylation. This pathway refers to how
siRNA-RISC complex can cleave RNA. The fragments are then degraded by exonucleases.





mRNA that has high rates of translation initiation tend to have slow rates of deadenylation.
This is explained by the fact that the translation initiation factors bound to the 5 end and to
PABPI at the poly(A) tail. AU-rich elements in the 3 UTR bind proteins that recruit a

deadenylation enzyme and exosome to degrade the mRNA. This allows for short-lived
mRNAs that are translated at high frequencies.

Localization of mRNA
3. Regulation of mRNA localization in cytoplasm

Some mRNAs are asymmetrically localized within the cytoplasm at the site where the
proteins they encode are required. Often 3UTR elements direct localization.


o Can localize all your mRNA at one end and they will never be translated
unless theyre in the right place.

o Gene with the promoter at 5 end (regular). You get no localization, diffuse
everywhere in the cell.
o If you play around with the 3 end, change the 3 end, you will get localization
where its required.
So the 3 end is important in regulation of mRNA localization

You might also like