You are on page 1of 13

GENOMA HUMANO

Dr. Victor M. Saavedra Alanis


Departamento de Bioquímica
Facultad de Medicina, UASLP

La información que se anexa es de la parte de Genética Molecular del Harrison de


Medicina Interna(Ed. 16 ó 1, es muy similar la información); pueden consultar la
edición en español y para quienes no la tengan aquí se anexa la versión en inglés.
Está bastante actualizada y se tratará de resumir la información en la presentación

Harrison's Internal Medicine


> Chapter 62. Principles of Human Genetics >

Chromosomes and DNA Replication

Organization of DNA into Chromosomes

Size of the Human Genome

The human genome is divided into 23 different chromosomes, including 22


autosomes (numbered 1–22) and the X and Y sex chromosomes. Adult cells
are diploid, meaning they contain two homologous sets of 22 autosomes and a
pair of sex chromosomes. Females have two X chromosomes (XX), whereas
males have one X and one Y chromosome (XY). As a consequence of meiosis,
germ cells (sperm or oocytes) are haploid and contain one set of 22 autosomes
and one of the sex chromosomes. At the time of fertilization, the diploid genome
is reconstituted by pairing of the homologous chromosomes from the mother
and father. With each cell division (mitosis), chromosomes are replicated,
paired, segregated, and divided into two daughter cells (Chap. 63).

The human genome is estimated to contain ~30,000–40,000 genes, a smaller


number than initially predicted, that are divided among the 23 chromosomes. A
gene is a functional unit that is regulated by transcription (see below) and
encodes a RNA product, which is most commonly, but not always, translated
into a protein that exerts activity within or outside the cell. Historically, genes
were identified because they conferred specific traits that are transmitted from
one generation to the next. Increasingly, they are characterized based on
expression in various tissues. The number of genes greatly underestimates the
complexity of genetic expression, as single genes can generate multiple spliced
mRNA products, which are translated into proteins that are subject to complex
posttranslational modification, such as phosphorylation. Proteomics, the study
of the proteome using technologies of large-scale protein separation and
identification, is an emerging field focused on protein variation and function.
Similarly, the field of metabolomics aims at determining the composition and
modifications of the metabolome, the complement of low-molecular-weight
molecules, many of which participate in various metabolic functions. These
analyses, which are heavily dependent on bioinformatics, reveal that
physiologic or pathologic alterations have myriad effects on the proteome and
the metabolome and emphasize that these processes involve modular networks
rather than linear pathways.

Human DNA consists of ~3 billion base pairs (bp) of DNA per haploid genome.
DNA length is normally measured in units of 1000 bp (kilobases, kb) or
1,000,000 bp (megabases, Mb). Not all DNA encodes genes. In fact, genes
account for only ~10–15% of DNA. Much of the remaining DNA consists of
highly repetitive sequences, the function of which is poorly understood. These
repetitive DNA regions, along with nonrepetitive sequences that do not encode
genes, may serve a structural role in the packaging of DNA into chromatin, i.e.,
DNA bound to histone proteins, and chromosomes (Fig. 62-1). If only 10% of
DNA is expressed and there are 30,000 genes, the average gene would be ~10
kb in length. Although many genes are about this size, the range is quite broad.
For example, some genes are only a few hundred bp, whereas others, such as
the DMD gene, are extraordinarily large (2 Mb).

Figure 62-1
Structure of chromatin and chromosomes. Chromatin is composed of double-
strand DNA that is wrapped around histone and nonhistone proteins forming
nucleosomes. The nucleosomes are further organized into solenoid structures.
Chromosomes assume their characteristic structure, with short (p) and long (q)
arms at the metaphase stage of the cell cycle.

Structure of DNA

Each gene is composed of a linear polymer of DNA. DNA is a double-stranded


helix composed of four different bases: adenine (A), thymidine (T), guanine (G),
and cytosine (C). Adenine is paired to thymidine, and guanine is paired to
cytosine, by hydrogen bond interactions that span the double helix. DNA has
several remarkable features that make it ideal for the transmission of genetic
information. It is relatively stable, at least in comparison to RNA or proteins. The
double-stranded nature of DNA and its feature of strict base-pair
complementarity permit faithful replication during cell division. As described
below, complementarity also allows the transmission of genetic information from

DNA RNA protein (Fig. 62-2). Messenger RNA (mRNA) is encoded by


the so-called sense or coding strand of the DNA double helix and is translated
into proteins by ribosomes.

Figure 62-2

Flow of genetic information. Multiple extracellular signals activate intracellular


signal cascades that result in altered regulation of gene expression through the
interaction of transcription factors with regulatory regions of genes. RNA
polymerase transcribes DNA into RNA that is processed to mRNA by excision of
intronic sequences. The mRNA is translated into a polypeptide chain to form the
mature protein after undergoing posttranslational processing. HAT, histone
acetyl transferase; CBP, CREB-binding protein; CREB, cyclic AMP response
element–binding protein; CRE, cyclic AMP responsive element; CoA, Co
activator; TAF, TBP-associated factors; GTF, general transcription factors; TBP,
TATA-binding protein; TATA, TATA box; RE, response element; NH2,
aminoterminus; COOH, carboxyterminus.

The presence of four different bases provides surprising genetic diversity. In the
protein-coding regions of genes, the DNA bases are arranged into codons, a
triplet of bases that specifies a particular amino acid. It is possible to arrange
the four bases into 64 different triplet codons (43). Each codon specifies 1 of the
20 different amino acids, or a regulatory signal, such as initiation and stop of
translation. Because there are more codons than amino acids, the genetic code
is degenerate; that is, most amino acids can be specified by several different
codons. By arranging the codons in different combinations and in various
lengths, it is possible to generate the tremendous diversity of primary protein
structure.

Regulation of Gene Expression

Mechanisms that regulate gene expression play a critical role in the function of
genes. The transcription of genes is controlled primarily by transcription factors
that bind to DNA sequences in the regulatory regions of genes. As described
below, mutations in transcription factors cause a significant number of genetic
disorders. Gene expression is also influenced by epigenetic events, such as X-
inactivation and imprinting, processes in which DNA methylation or histone
modifications are associated with gene silencing. Several genetic disorders,
such as Prader-Willi syndrome (neonatal hypotonia, developmental delay,
obesity, short stature, and hypogonadism) and Albright hereditary
osteodystrophy (resistance to parathyroid hormone, short stature,
brachydactyly, resistance to other hormones in certain subtypes), exhibit the
consequences of genomic imprinting. Most studies of gene expression have
focused on the regulatory DNA elements of genes that control transcription.
However, it should be emphasized that gene expression requires a series of
steps, including mRNA processing, protein translation, and posttranslational
modifications, all of which are actively regulated (Fig. 62-2).

The new field of functional genomics is based on the concept that


understanding alterations of gene expression under various physiologic and
pathologic conditions provides insight into the underlying processes, and by
revealing certain gene expression profiles, this knowledge may be of diagnostic
and therapeutic relevance. The large-scale study of expression profiles, which
takes advantage of microarray technologies, is also referred to as
transcriptomics because the complement of mRNAs transcribed by the cellular
genome is called the transcriptome.

Structure of Genes

A gene product is usually a protein but can occasionally consist of RNA that is
not translated (e.g., microRNAs). Exons refer to the portion of genes that are
eventually spliced together to form mRNA. Introns refer to the spacing regions
between the exons that are spliced out of precursor RNAs during RNA
processing (Fig. 62-2).

The gene locus also includes regions that are necessary to control its
expression. The regulatory regions most commonly involve sequences
upstream (5') of the transcription start site, although there are also examples of
control elements within introns or downstream of the coding regions of a gene.
The upstream regulatory regions are also referred to as the promoter. The
minimal promoter usually consists of a TATA box (which binds TATA-binding
protein, TBP) and initiator sequences that enhance the formation of an active
transcription complex. A gene may generate various transcripts through the use
of alternative promoters and/or alternative splicing of exons, mechanisms that
contribute to the enormous diversity of proteins and their functions.
Transcriptional termination signals reside downstream, or 3', of a gene. Specific
sequences, such as the AAUAAA sequence at the 3' end of the mRNA,
designate the site for polyadenylation (poly-A tail), a process that influences
mRNA transport to the cytoplasm, stability, and translation efficiency. A rigorous
test of the regulatory region boundaries involves expressing a gene in a
transgenic animal to determine whether the isolated DNA flanking sequences
are sufficient to recapitulate the normal developmental, tissue-specific, and
signal-responsive features of the endogenous gene. This has been
accomplished for only a few genes; there are many examples in which large
genomic fragments only partially reconstitute normal gene regulation in vivo,
implying the presence of distant regulatory sequences. Genome-wide analyses
of selected transcription factor binding sites, such as for the estrogen receptor,
reveal that the majority of regulatory sites are very distant from the transcription
start sites of genes. A detailed understanding of mechanisms that regulate
genes is also relevant for gene therapy strategies that require normal gene
regulation (Chap. 65).

The number of DNA sequences and transcription factors that regulate


transcription is much greater than originally anticipated. Most genes contain at
least 15–20 discrete regulatory elements within 300 bp of the transcription start
site. This densely packed promoter region often contains binding sites for
ubiquitous transcription factors such as CAAT box/enhancer binding protein
(C/EBP), cyclic AMP response element–binding (CREB) protein, selective
promoter factor 1 (Sp-1), or activator protein 1 (AP-1). However, factors
involved in cell-specific expression may also bind to these sequences. For
example, basic helix-loop-helix (bHLH) proteins bind to E-boxes in the
promoters of myogenic genes, and steroidogenic factor 1 (SF-1) binds to a
specific recognition site in the regulatory region of multiple steroidogenic
enzyme genes. Key regulatory elements may also reside at a large distance
from the proximal promoter. The globin and the immunoglobulin genes, for
example, contain locus control regions that are several kilobases away from the
structural sequences of the gene. Specific groups of transcription factors that
bind to these promoter and enhancer sequences provide a combinatorial code
for regulating transcription. In this manner, relatively ubiquitous factors interact
with more restricted factors to allow each gene to be expressed and regulated
in a unique manner that is dependent on developmental state, cell type, and
numerous extracellular stimuli. As described below, the transcription factors that
bind to DNA actually represent only the first level of regulatory control. Other
proteins—coactivators and co-repressors—interact with the DNA-binding
transcription factors to generate large regulatory complexes. These complexes
are subject to control by numerous cell-signaling pathways, including
phosphorylation, acetylation, sumoylation, and ubiquitinylation. Ultimately, the
recruited transcription factors interact with, and stabilize, components of the
basal transcription complex that assembles at the site of the TATA box and
initiator region. This basal transcription factor complex consists of >30 different
proteins. Gene transcription occurs when RNA polymerase begins to synthesize
RNA from the DNA template.

Mutations can occur in all domains of a gene (Fig. 62-4). A point mutation
occurring within the coding region leads to an amino acid substitution if the
codon is altered. Point mutations that introduce a premature stop codon result
in a truncated protein. Large deletions may affect a portion of a gene or an
entire gene, whereas small deletions and insertions alter the reading frame if
they do not represent a multiple of three bases. These "frameshift" mutations
lead to an entirely altered carboxy terminus. Mutations occurring in regulatory or
intronic regions may result in altered expression or splicing of genes. Examples
are shown in Fig. 62-5.

Figure 62-4
Point mutations causing -thalassemia as example of allelic heterogeneity.

The -globin gene is located in the globin gene cluster. Point mutations can
be located in the promoter, the CAP site, the 5'-untranslated region, the initiation
codon, each of the three exons, the introns, or the polyadenylation signal. Many
mutations introduce missense or nonsense mutations, whereas others cause

defective RNA splicing. Not shown here are deletion mutations of the -
globin gene or larger deletions of the globin locus that can also result in

thalassemia. , Promoter mutations; *, CAP site; , 5'UTR; , Initiation

codon; , Defective RNA processing; , Missense and nonsense

mutations; , Poly A signal.

Figure 62-5
A. Examples of mutations. The coding strand is shown with the encoded amino
acid sequence. B. Chromatograms of sequence analyses after amplification of
genomic DNA by polymerase chain reaction.

Transcriptional Activation and Repression

Every gene is controlled uniquely, whether in its spatial or temporal pattern of


expression or in its response to extracellular signals. It is estimated that
transcription factors account for ~30% of expressed genes. A growing number
of identified genetic diseases involve transcription factors (Table 62-2). The
MODY (maturity-onset diabetes of the young) disorders are representative of
this group of diseases; mutations in several different islet cell–specific
transcription factors cause various forms of MODY (Chap. 338).

Table 62-2 Selected Examples of Diseases Caused by Mutations and


Rearrangements in Transcription Factor Classes

Transcription Example Associated Disorder


Factor Class
Nuclear receptors Androgen Complete or partial androgen insensitivity
receptor (recessive missense mutations)
Spinobulbar muscular atrophy (CAG repeat
expansion)
Zinc finger proteins WT1 WAGR syndrome: Wilm's tumor, aniridia,
genitourinary malformations, mental
retardation
Basic helix-loop- MITF Waardenburg syndrome type 2A
helix
Homeobox IPF1 Maturity onset of diabetes mellitus type 4
(heterozygous mutation/haploinsufficiency)
Pancreatic agenesis (homozygous mutation)
Leucine zipper Retina leucine Autosomal dominant retinitis pigmentosa
zipper (NRL)
High mobility SRY Sex-reversal
group (HMG)
proteins
Forkhead Maturity-onset of diabetes mellitus types 1, 3,
HNF4 , 5

HNF1 ,

HNF1
Paired box PAX3 Waardenburg syndrome types 1 and 3
T-box TBX5 Holt-Oram syndrome (thumb anomalies, atrial
or ventricular septum defects, phocomelia)
Cell cycle control P53 Li-Fraumeni syndrome, other cancers
proteins
Coactivators CREB binding Rubinstein-Taybi syndrome
protein (CBP)
General TATA-binding Spinocerebellar ataxia 17 (CAG expansion)
transcription factors protein (TBP)
Transcription VHL Von Hippel–Lindau syndrome (renal cell
elongation factor carcinoma, pheochromocytoma, pancreatic
tumors, hemangioblastomas)
Autosomal dominant inheritance, somatic
inactivation of second allele (Knudson two-hit
model)
Runt CBFA2 Familial thrombocytopenia with propensity to
acute myelogenous leukemia
Chimeric proteins PML—RAR Acute promyelocytic
due to leukemiat(15;17)(q22;q11.2-q12) translocation
translocations
Note: Selected abbreviations include: SRY, sex determining region Y; HNF,
hepatocyte nuclear factor; CREB (cAMP responsive element binding) binding
protein; VHL, Von Hippel–Lindau; PML, promyelocytic leukemia; RAR, retinoic
acid receptor.

Transcriptional activation can be divided into three main mechanisms:

1. Events that alter chromatin structure can enhance the access of


transcription factors to DNA. For example, histone acetylation generally
opens chromatin structure and is correlated with transcriptional
activation.
2. Posttranslational modifications of transcription factors, such as
phosphorylation, can induce the assembly of active transcription
complexes. As an example, phosphorylation of CREB protein on serine
133 induces a conformational change that allows the recruitment of
CREB-binding protein (CBP), a factor that integrates the actions of many
transcription factors, including proteins, with histone acetyltransferase
activity.
3. Transcriptional activators can displace a repressor protein. This
mechanism is particularly common during development when the pattern
of transcription factor expression changes dynamically.

Of course, these mechanisms are not mutually exclusive, and most genes are
activated by some combination of these events.
Suppression of gene expression is as important as gene activation in the control
of cell differentiation and function. Some mechanisms of repression are the
corollary of activation. For example, repression is often associated with histone
deacetylation or protein dephosphorylation. For nuclear hormone receptors,
transcriptional silencing involves the recruitment of repression complexes that
contain histone deacetylase activity. Aberrant expression of repressor proteins
is sometimes associated with neoplasia. The t(15;17) chromosomal
translocation that occurs in promyelocytic leukemia fuses the PML gene to a

portion of the retinoic acid receptor (RAR ) gene (Table 62-2). This
event causes unregulated transcriptional repression in a manner that precludes
normal cellular differentiation. The addition of the RAR ligand, retinoic acid,
activates the receptor, thereby relieving repression and allowing cells to
differentiate and ultimately undergo apoptosis. This mechanism has therapeutic
importance as the addition of retinoic acid to treatment regimens induces a
higher remission rate in patients with promyelocytic leukemia (Chap. 104).
Methylation of promoter regions is frequently found in neoplasms and silences
gene expression.

You might also like