You are on page 1of 98

Chapter 2

Structure and Behavior of


Genes
and Chromosomes
What will we study in this chapter
DNA structure: a brief review
The central dogma: DNA RNA protein

Structure of human chromosomes

Mitosis and Meiosis


In the next day or so Crick and I shall send a note to
Nature proposing our structure(of DNA) as a
possible model, at the same time emphasizing its
provisional nature and the lack of proof in its favor.
Even if wrong, I believe it to be interesting since it
provides a concrete example of a structure composed
of complementary chains. If, by chance, it is right,
then I suspect we may be making a slight dent into
the manner in which DNA can reproduce itself.
—James Watson, from a letter to Max Delberück, March
12, 1953
James Watson, a Chicago native, was a child
prodigy who entered college at the age of 15.
By 23 he was a postdoctoral fellow at the
University of Copenhagen studying genetics. It
was there that he heard Maurice Wilkins of
King’s College give a talk about investigations
into the molecular structure of DNA. Watson
was hooked, and he moved to Cambridge
University’s Cavendish Laboratory to study
DNA’s structure.
Watson worked well there with Francis Crick, and
the two began using three-dimensional molecular
models to test structural ideas. In 1953 Wilkins
showed Watson and Crick X-ray crystallographic
images of DNA taken by Wilkins’s coworker Rosalind
Franklin. On the basis of these images—which had
been shown without Franklin’s knowledge—Watson
and Crick worked out a double-helical structural
model within a few weeks. For this Watson, Crick,
and Wilkins were awarded the Nobel Prize in
physiology or medicine in 1962.
In addition to his work on DNA, Watson also
discovered the molecular structure of the
tobacco mosaic virus and helped uncover the
role of messenger RNA in protein synthesis.
Watson described the discovery of DNA in
his highly popular book The Double Helix,
though the book has been criticized for
downplaying Franklin’s role in the discovery.
DNA structure: a brief review
DNA is a polymeric nucleic acid macromolecule
composed of three types of units: a five-carbon
sugar, deoxyribose; a nitrogen-containing base;
and a phosphate group.

base

phosphate five-carbon
sugar
In DNA, there are two purine bases, adenine
(A) and guanine (G), and two pyrimidine bases,
thymine (T) and cytosine (C).

Nucleotides, each composed of a base, a


phosphate, and a sugar moiety, polymerize into
long polynucleotide chains by 5’ – 3’
phosphodiester bonds formed between adjacent
deoxyribose units.
In the human genome, these polynucleotide chains
(in their double-helix form) are hundreds of millions of
nucleotides long, ranging in size from approximately
50 million base pairs (for the smallest chromosome,
chromosome 21) to 250 million base pairs (for the
largest chromosome, chromosome 1).
The anatomical structure of DNA carries the
chemical information that allows the exact
transmission of genetic information from one
cell to its daughter cells and from one
generation to the next. At the same time, the
primary structure of DNA specifies the amino
acid sequences of the polypeptide chains of
proteins.
DNA replication
DNA replication is semiconservative and
synthesis of DNA strands is
semidiscontinuous
During the process of DNA synthesis (DNA
replication), the two DNA strands of each
chromosome are unwound by a helicase enzyme
and each DNA strand directs the synthesis of a
complementary DNA strand to generate two
daughter DNA duplexes, each of which is identical
to the parent molecule.
DNA replication is initiated at specific
points, which have been termed origins
of replication. Starting from such an
origin, the initiation of DNA replication
results in a replication fork, where the
parental DNA duplex bifurcates into two
daughter DNA duplexes.
Replication
fork: the point
of bifurcation
when a DNA
double helix is
being
replicated. Two
replication
forks
proceeding
outwards from
a single starting
point create a
replication
bubble.
The two daughter strands must run in
opposite directions ,the direction of
chain growth must be 5 '→ 3 ' for one
daughter strand, the leading strand, but
3 ' → 5 ' for the other daughter strand,
the lagging strand .
The central dogma:
DNA RNA protein
Genetic information is contained in DNA
in the chromosomes within the cell
nucleus, but protein synthesis, during
which the information encoded in the DNA
is used, takes place in the cytoplasm.
The molecular link between these two related
types of information (the DNA code of genes and
the amino acid code of proteins) is ribonucleic
acid (RNA). The chemical structure of RNA is
similar to that of DNA, except that each nucleotide
in RNA has a ribose sugar component instead of
a deoxyribose; in addition, uracil (U) replaces
thymine as one of the pyrimidines of RNA. An
additional difference between RNA and DNA is
that RNA in most organisms exists as a single-
stranded molecule, whereas DNA exists as a
double helix.
The informational relationships among DNA,
RNA, and protein are intertwined: DNA directs
the synthesis and sequence of RNA, RNA directs
the synthesis and sequence of polypeptides, and
specific proteins are involved in the synthesis and
metabolism of DNA and RNA. This flow of
information is referred to as the “central dogma”
of molecular biology.
Genetic information is stored in DNA by means of a
code (the genetic code) in which the sequence of
adjacent bases ultimately determines the sequence of
amino acids in the encoded polypeptide. First, RNA
is synthesized from the DNA template through a
process known as transcription. The RNA, carrying
the coded information in a form called messenger
RNA (mRNA), is then transported from the nucleus
to the cytoplasm, where the RNA sequence is
decoded, or translated, to determine the sequence of
amino acids in the protein being synthesized.
The process of translation occurs on ribosomes, which
are cytoplasmic organelles with binding sites for all of
the interacting molecules, including the mRNA,
involved in protein synthesis. Ribosomes are
themselves made up of many different structural
proteins in association with a specialized type of RNA
known as ribosomal RNA (rRNA). Translation
involves yet a third type of RNA, transfer RNA
(tRNA), which provides the molecular link between the
coded base sequence of the mRNA and the amino acid
sequence of the protein.
An organism may contain many types of
somatic cells, each with distinct shape and
function. However, they all have the same
genome. The genes in a genome do not have
any effect on cellular functions until they are
"expressed". Different types of cells express
different sets of genes, thereby exhibiting
various shapes and functions.
"Gene expression" means the production of a
protein or a functional RNA from its gene.
Several steps are required:

Transcription: A DNA strand is used as the template


to synthesize a RNA strand, which is called the
primary transcript.

RNA processing: This step involves modifications of


the primary transcript to generate a mature mRNA
(for protein genes) or a functional tRNA or rRNA.
For RNA genes (tRNA and rRNA), the expression is
complete after a functional tRNA or rRNA is
generated. However, protein genes require additional
steps:

Nuclear transport: mRNA has to be transported


from the nucleus to the cytoplasm for protein
synthesis.
Protein synthesis: In the cytoplasm, mRNA binds
to ribosomes, which can synthesize a polypeptide
based on the sequence of mRNA.
Essential steps involved in the expression of protein genes.
Gene Structure and Organization

The human genome is the term used to describe the


total genetic information (DNA content) in human
cells. It really comprises two genomes: a complex
nuclear genome which accounts for 99.9995% of the
total genetic information, and a simple mitochondrial
genome which accounts for the remaining 0.0005% .
The nucleus of a human cell contains more than 99%
of the cellular DNA. The nuclear genome is
distributed between 24 different types of linear
double-stranded DNA molecule, each of which has
histones and other nonhistone proteins bound to it,
constituting a chromosome

The 24 different chromosomes (22 types of


autosome and two sex chromosomes, X and Y)
can easily be differentiated by chromosome
banding techniques
In its simplest form, a gene can be visualized
as a segment of a DNA molecule containing
the code for the amino acid sequence of a
polypeptide chain and the regulatory
sequences necessary for expression. This
description, however, is inadequate for genes
in the human genome (and indeed in most
eukaryotic genomes), because few genes exist
as continuous coding sequences.
By definition, a gene includes the entire nucleic acid
sequence necessary for the expression of its product
(peptide or RNA). Such sequence may be divided
into regulatory region and transcriptional region.
The regulatory region could be near or far from the
transcriptional region. The transcriptional region
consists of exons and introns. Exons encode a
peptide or functional RNA. Introns will be
removed after transcription.
Introns are sections of DNA within a gene
that do not encode part of the protein that
the gene produces, and are spliced out of the
mRNA that is transcribed from the gene
before it is exported from the cell nucleus.
Introns exist mainly (but not only) in
eukaryotic cells. The regions of a gene that
remain in the spliced mRNA are called
exons.
Rather, the vast majority of genes are
interrupted by one or more noncoding regions.
These intervening sequences, called introns, are
initially transcribed into RNA in the nucleus but
are not present in the mature mRNA in the
cytoplasm. Thus, information from the intronic
sequences is not normally represented in the
final protein product. Introns alternate with
coding sequences, or exons, that ultimately
encode the amino acid sequence of the protein.
Rather, the vast majority of genes are
interrupted by one or more noncoding regions.
These intervening sequences, called introns, are
initially transcribed into RNA in the nucleus but
are not present in the mature mRNA in the
cytoplasm. Thus, information from the intronic
sequences is not normally represented in the
final protein product. Introns alternate with
coding sequences, or exons, that ultimately
encode the amino acid sequence of the protein.
Structural features of a typical human gene

A gene includes not only the actual coding


sequences but also adjacent nucleotide
sequences required for the proper expression of
the gene — that is, for the production of a
normal mRNA molecule, in the correct amount,
in the correct place, and at the correct time
during development or during the cell cycle.
The adjacent nucleotide sequences provide the
molecular “start” and “stop” signals for the synthesis
of mRNA transcribed from the gene. At the 5’ end of
the gene lies a promoter region, which includes
sequences responsible for the proper initiation of
transcription. Within the 5’ region are several DNA
elements whose sequence is conserved among many
different genes. This conservation, together with
functional studies of gene expression in many
laboratories, indicates that these particular sequence
elements play an important role in regulation.
Both promoters and other regulatory
elements located either 5’ or 3’ of a gene or
in its introns) can be sites of mutation in
genetic disease that can interfere with the
normal expression of a gene. These
regulatory elements, including enhancers,
silencers, and locus control regions
(LCRs).
At the 3’ end of the gene lies an untranslated
region of importance that contains a signal for
addition of a sequence of adenosine residues
(the so-called polyA tail) to the end of the
mature mRNA.
Organization of
the human
genome
Regions of the genome with similar
characteristics or organization, replication, and
expression are not arranged randomly but,
rather, tend to be clustered together. This
functional organization of the genome correlates
remarkably well with its structural organization
as revealed by metaphase chromosome banding.
The overall significance of this functional
organization is that chromosomes are not just a
random collection of different types of genes
and other DNA sequences. Some chromosome
regions, or even whole chromosomes, are quite
high in gene content (“gene-rich”), whereas
others are low (“gene-poor”). Certain types of
sequence are characteristic of the different
physical hallmarks of human chromosomes.
The clinical consequences of abnormalities of
genome structure reflect the specific nature
of the genes and sequences involved. Thus,
abnormalities of gene-rich chromosomes or
chromosomal regions tend to be much more
severe clinically than similar sized defects
involving gene-poor parts of the genome.
Gene families
Many genes belong to families of closely related
DNA sequences, recognized as families because of
similarity of the nucleotide sequence of the genes
themselves or of the amino acid sequence of the
encoded polypeptides.

"Gene family" refers to a set of genes with


homologous sequences. For example, H2A, H2B, H3
and H4 are in the same histone gene family. Their
products have similar structures and functions.
DNA sequences that closely resemble
known genes but are nonfunctional are
called pseudogenes. Pseudogenes are
widespread in the genome and are thought
to be byproducts of evolution, representing
genes that were once functional but are now
vestigial, having been inactivated by
mutations in coding or regulatory
sequences.
The human genome consists of three broad
sequence components:
•Single copy, or at least very low copy number, This class
accounts for 50-60% of mammalian DNA. - reassociates very
slowly. A single strand from a single copy sequence will require
some considerable time to find a complementary partner
strand, given that the vast majority of DNA fragments are
unrelated to it.
•Moderately repetitive Roughly 25-40% of mammalian DNA
reassociates at an intermediate rate. This class includes
interspersed repeats
•Highly repetitive About 10-15% of mammalian DNA
reassociates very rapidly. This class includes tandem repeats.
Several different categories of repetitive DNA
are recognized. A useful distinguishing feature is
whether the repeated sequences (“repeats”) are
clustered in one or a few locations or whether
they are dispersed throughout the genome,
interspersed with single-copy sequences along
the chromosome.
Depending on the average size of the arrays of
repeat units, highly repetitive noncoding DNA
belonging to this class can be grouped into three
subclasses:satellite, minisatellite and microsatellite
DNA.

Satellite DNA is composed of very long arrays


of tandem repeats which can be separated from
bulk DNA by buoyant density gradient
centrifugation
The size of a satellite DNA ranges from 100 kb to
over 1 Mb. In humans, a well known example is the
alphoid DNA located at the centromere of all
chromosomes. Its repeat unit is 171 bp and the
repetitive region accounts for 3-5% of the DNA in
each chromosome. Other satellites have a shorter
repeat unit. Most satellites in humans or in other
organisms are located at the centromere.
Minisatellite DNA is composed of moderately
sized arrays of tandem repeats and is often located
at or close to telomeres

Microsatellite DNA is defined by the presence of


short arrays of tandem simple repeat units and is
dispersed throughout the human genome
The size of a minisatellite ranges from 1 kb to 20 kb.
One type of minisatellites is called variable number of
tandem repeats (VNTR). Its repeat unit ranges from 9
bp to 80 bp. They are located in non-coding regions.
The number of repeats for a given minisatellite may
differ between individuals. This feature is the basis of
DNA fingerprinting.
Another type of minisatellites is the telomere. In a
human germ cell, the size of a telomere is about 15 kb.
In an aging somatic cell, the telomere is shorter. The
telomere contains tandemly repeated sequence
GGGTTA.
Microsatellites are also known as short tandem
repeats (STR), because a repeat unit consists of only 1
to 6 bp and the whole repetitive region spans less than
150 bp. Similar to minisatellites, the number of
repeats for a given microsatellite may differ between
individuals. Therefore, microsatellites can also be
used for DNA fingerprinting.
In addition to satellite DNAs, another major
class of repetitive DNA in the genome consists
of related sequences that are dispersed
throughout the genome rather than localized.
Although many small DNA families meet this
general description, two in particular warrant
discussion because together they make up a
significant proportion of the genome and
because they have been implicated in genetic
diseases.
The best-studied dispersed repetitive elements
belong to the so-called Alu family. The members
of this family are about 300 base pairs in length
and are recognizably related to each other
although not identical in sequence. In total, there
are about 500,000 Alu family members in the
genome, making up at least several percent of
human DNA. In some regions of the genome, they
make up a much higher percentage of the DNA.
A second major dispersed, repetitive DNA
family is called the L1 family. L1 elements
are long, repetitive sequences (up to 6 kb
in length) that are found in about 100,000
copies per genome. They are plentiful in
some regions of the genome but relatively
sparse in others.
Structure of human
chromosomes
human chromosome at metaphase
The composition of genes in the human
genome, as well as the determinants of
their expression, is specified in the DNA
of the 46 human chromosomes. As we
saw in an earlier section, each human
chromosome is believed to consist of a
single, continuous DNA double helix;
that is, each chromosome in the nucleus
is a long, linear double-stranded DNA
molecule.
Chromatin in the nucleus
Chromosomes are not naked DNA double
helices, however. The DNA molecule of a
chromosome exists as a complex with a
family of basic chromosomal proteins called
histones and with a heterogeneous group of
acidic, nonhistone proteins that are much
less well characterized, but that appear to
be critical for establishing a proper
environment to ensure normal chromosome
behavior and appropriate gene expression.
Together, this complex of DNA and protein
is called chromatin.
There are five major types of histones that play
a critical role in the proper packaging of the
chromatin fiber. Two copies each of the four core
histones H2A, H2B, H3, and H4 constitute an
octamer, around which a segment of DNA double
helix winds, like thread around a spool.
Approximately 140 base pairs of DNA are
associated with each histone core, making just
under two turns around the octamer. After a short
(20 to 60 base-pair) “spacer” segment of DNA, the
next core DNA complex forms, and so on, giving
chromatin the appearance of beads on a string.
Each complex of DNA with core histones is called a
nucleosome, which is the basic structural unit of
chromatin.
nucleosome
The fifth histone, H1, appears to bind to DNA
at the edge of each nucleosome, in the
internucleosomal spacer region. The amount
of DNA associated with a core nucleosome,
together with the spacer region, is about 200
base pairs.
The long strings of nucleosomes are themselves
further compacted into a secondary helical chromatin
structure that appears under the electron microscope
as a thick, 30-nm-diameter fiber (about three times
thicker than the nucleosomal fiber). This cylindrical
“solenoid” fiber (from the Greek solenoeides, “pipe-
shaped”) appears to be the fundamental unit of
chromatin organization. The solenoids are themselves
packed into loops or domains attached at intervals of
about 100 kb or so to a nonhistone protein scaffold or
matrix.
During the cell cycle, chromosomes pass
through orderly stages of condensation and
decondensation. In the interphase nucleus,
chromosomes and chromatin are quite
decondensed in relation to the highly condensed
state of chromatin in metaphase. Nonetheless,
even in interphase chromosomes, DNA in
chromatin is substantially more condensed than
it would be as a native, protein-free double helix.
Different levels of DNA condensation. (1) Single DNA strand. (2)
Chromatin strand (DNA with histones). (3) Condensed
chromatin during interphase with centromere. (4) Condensed
chromatin during prophase. (Two copies of the DNA molecule
are now present) (5) Chromosome during metaphase.
Mitosis and Meiosis
Two types of cell division:
Mitosis: occurs in somatic cells; objective is to make
identical body cells, each with a full set of identical
chromosomes (2n)

Diploid: The number of chromosomes normally


present in a somatic cell. A diploid cell contains two
copies of each chromsome(2n).
Meiosis: production of sex cells (gametes);
objective is to divide chromosomes in half (n)=
"reduction division"
Haploid: The normal number of chromosomes
present in an egg or sperm. A haploid cell
contains a single set of chromsomes(n). In
humans, the haploid number of chromosomes is
23.
Mitosis Stages
Interphase: the period between cell
divisions; cell spends most of its
time in this stage, generally
carrying out its normal activities.
Nearing the end of this phase, the
DNA and cell contents will replicate
in preparation for division.
G1 Phase or the "Gap 1" Phase
The chromosomes decondense as they enter the G1
phase; this is a physiologically active time for the cell.
The cell synthesizes the necessary enzymes and
proteins needed for cell growth. DNA consists of a
single unreplicated helix (with histone and non-histone
proteins). In the G1 , the cell may be growing, active,
and performing many intense biochemical activities.
S Phase or the "Synthesis" Phase
DNA and chromosomal proteins are replicated. This
phase lasts a few hours.
Each chromosomes is composed of two identical
strands of DNA called sister chromatids.
G2 Phase or the "Gap 2" Phase
Between synthesis and mitosis. The mitotic spindle
proteins are synthesized. The mitotic spindle is a
structure that is involved with the movement of
chromosomes during mitosis.
•Prophase: The chromatin, diffuse in interphase, condenses
into chromosomes. Each chromosome has duplicated and now
consists of two sister chromatids. At the end of prophase, the
nuclear envelope breaks down into vesicles.
• Metaphase: The chromosomes align at the equitorial plate
and are held in place by microtubules attached to the mitotic
spindle and to part of the centromere.
•Anaphase: The centromeres divide. Sister chromatids separate
and move toward the corresponding poles.
•Telophase: Daughter chromosomes arrive at the poles and the
microtubules disappear. The condensed chromatin expands
and the nuclear envelope reappears.
•Cytokinesis: The cytoplasm divides, the cell membrane
pinches inward ultimately producing two daughter cells .
The Importance of Mitosis
Mitosis ensures that each new body cell has the same
genetic makeup as its parent. Mutations can and do
occur occasionally but, for the most part, all of your
body cells have identical DNA.
Mitosis not only functions to replace cells and make
new cells (growth) it also reduces cell size.
MEIOSIS- the process of nuclear division that
reduces the number of chromosomes by half
2n---->n
The two nuclear divisions in meiosis result in
four daughter cells forming from an original
parent cell, each with a 1n of chromosomes
sexual reproduction n + n gametes fuse to
form a complete 2n zygote
Homologous chromosomes: A pair of
chromosomes, one from each parent, carrying
genes for the same traits, in the same order.

Synapse: homologous chromosomes begin to pair


closely along their entire length.
In meiosis, the process is quite similar to mitosis.
However, another cell division takes place in which
there is no extra DNA replication step. Instead of
having a pair of genes (as in a diploid cell), there is
only one copy of each gene (a haploid cell). This one
copy of genetic information produces gametes of
either sperm or eggs. Thus, only one copy of a gene is
passed on to each gamete. It is not until the sperm
and egg join that there will be two halves of genetic
information. This process is the basis for all of
Mendel's laws.
An extremely important feature of
meiosis I is that during synapsis, when
homoglogous chromsomes are paired
together, crossovers occur.
Gametogenesis:
Male meiosis produces four spermatozoa
from a single germ cell precursor; female
meiosis produces just one oocyte from a
single germ cell precursor, discarding
two polar bodies in the process.
Mutations

The central dogma explains how


information in DNA is converted into the
sequence of amino acids in a protein
How do changes in DNA correlate with
changes in the sequence of amino acids in a
protein?
Mutations are permanent, sometimes
transmissible (if the change is to a germ cell)
changes to the genetic material (usually
DNA or RNA) of a cell. Mutations can be
caused by copying errors in the genetic
material during cell division and by
exposure to radiation, chemicals, or viruses.
In multicellular organisms, mutations
can be subdivided into germline
mutations, which can be passed on to
progeny and somatic mutations, which
(when accidental) often lead to the
malfunction or death of a cell and can
cause cancer.
The simplest change is a substitution
of one nucleotide for another, called
a “point mutation” .
a. silent mutation:
a change in a base pair does not result in a change
in the sequence of amino acids in a protein
b. missense mutation:
a mutation results a change in an amino acid
where the new amino acids has a different property
than the old amino acid.
The protein with the new primary structure may have
reduced or no activity.
c. nonsense mutations:
a mutation results in a new stop translation condon
formed before the naturally occuring one.
Translation is stopped prematurely and a shortened
protein is made.
d. frameshift mutation:
a deletion or insertion of one base results in a change in
the translational reading frame
a reading frame is a contiguous and non-
overlapping set of three-nucleotide codons in
DNA or RNA. There are 3 possible reading
frames in a strand. A reading frame that
contains no stop codon is called an open
reading frame (ORF).
Exon skipping

Splicing of an intron requires an essential signal: "GT........AG".


If the splice acceptor site AG is mutated (e.g., A to C in this
figure), the splicing machinery will look for the next acceptor
site. As a result, the exon between two introns is also removed.
The
end

You might also like