You are on page 1of 5

CBG.

02: Genomes

ENCODE PROJECT WRITE EULOGY


FOR JUNK DNA

- human DNA has 3 billion bases


- instead of the expected 100,000 genes, the
initial analysis found about 35,000 and that
number has since been whittled down to about
21,000
- 80% of the human genome serves some
purpose, biochemically speaking:
- specify landing spots for proteins that influence
gene activity
- strands of RNA with myriad roles
- places where chemical modifications serve to functional regions or because no mutation in
silence stretches of our chromosomes these regions can be deleterious
- a genes regulation is far more complex than - problems in ENCODE logic:
previously thought, being influenced by multiple o seldom used causal role definition of biological
stretches of regulatory DNA located both near function and then applying it inconsistently to
and far from the gene itself and by strands of different biochemical properties
RNA not translated into proteins (=noncoding o logical fallacy affirming the consequent
RNA) o failing to appreciate the crucial difference
- 11,224 DNA stretches are classified as between junk DNA and garbage DNA
pseudogenes, dead genes now known to be o using analytical methods that yield biased errors
active in some cell types or individuals and inflate estimates of functionality
- there are many genes out there in which DNA o favouring statistical sensitivity over specificity
codes for RNA, not a protein, as the end product
o emphasizing statistical significance rather than
- various cell genes home in on different cell
the magnitude of the effect
compartments, as if they have fixed addresses
- in biology, there are 2 main concepts of function:
where they operate: some go to the nucleus,
o selected effect: function of a trait is
some to the nucleolus and some to the
cytoplasm the effect for which it was selected, or
by which it is maintained
GINGERAS: the fundamental unit of the genome o causal role: historical and non-
and the basic unit of heredity should be the evolutionary: for a trait Q to have a
transcript the piece of RNA decoded from DNA- causal role function, G, it is
and not the gene necessary and sufficient that Q
performs G
- 5% of the human genome is conserved across
mammals (ex:) TATAAA maintained by natural selection to
- DNAs bases function in gene regulation through bind a transcription factor; a mutated sequence,
their interactions with transcription factors and resembling this one, also binds the transcription
other proteins; factor, but does not result in transcription (no
- 8% of the genome falls within a transcription adaptive or maladaptive consequence); hence, the
factor binding site, a percentage that is expected second sequence has no selected effect function,
to double once more transcription factors have but its causal role function is to bind a transcription
been tested factor
ON THE IMMORALITY OF TELEVISION
- from an evolutionary viewpoint, a function can be
SETS: FUNCTION IN THE HUMAN assigned to a DNA sequence if and only if it is
GENOME ACCORDING TO THE possible to destroy it; unless a genomic
functionality is actively protected by selection, it will
EVOLUTION-FREE GOSPEL OF accumulate deleterious mutations and will cease to
ENCODE (Dan Graur) be functional
- the fact that sometimes it is difficult to identify
- less than 10% of the genome is evolutionarily selection should never be used as a justification to
conserved through purifying selection ignore selection altogether in assigning functionality
- according to ENCODE, a biological function can be to parts of the human genome
maintained indefinitely without selection, which - the surest indicator of the existence of a genomic
implies that at least 80-10=70% of the genome is function is that losing it has some phenotypic
perfectly invulnerable to deleterious mutations, consequence for the organism
either because no mutation can ever occur in these
- functional regions of the genome should evolve
more slowly and be more conserved among species
than non-functional ones
ENCODE: THE ROUGH GUIDE TO THE
- Ward and Kellis confirmed that approx.. 5% of the HUMAN GENOME
genome is interspecifically conserved and an
additional 4% of the human genome in under - For the last decade, geneticists have run a
selection seemingly endless stream of genomewide
- According to ENCODE: associationstudies (GWAS), attempting to
o 74.7% of the genome is transcribed understand the genetic basis of disease. They
o 56.1% is associated with modified have thrown up a long list of SNPs variants at
histones specific DNA lettersthat correlate with the risk
o 15.2% is found in open-chromatin of different conditions.
areas - The ENCODE team have mappedallof these to
o 8.5% binds transcription factors their data. They found that just 12 percent of the
o 4.6% consists of methylated CpG SNPs lie within protein-coding areas. They also
dinucleotides showed that compared to random SNPs, the
- transcription is fundamentally a stochastic process disease-associated ones are 60 percent more
- classes of sequences that are known to be likely to lie within functional, non-coding regions,
abundantly transcribed, but are typically devoid of especially in promoters and enhancers. This
function: suggests that many of these variants are
- pseudogenes (up to 1/10 transcribed; lack coding controlling the activity of different genes, and
potential due to the presence of disruptive provides many fresh leads for understanding how
mutations, evolve very rapidly and are mostly they affect our risk of disease. It was one of
subject to no functional constraint), those too good to be true moments, says Birney.
- introns (some human introns harbour regulatory Literally, I was in the room [when they got the
sequences TISHKOFF, 2006 as well as sequences result] and I went: Yes!
that produce small RNA molecules (HIROSE, 2003, - Imagine a massive table. Down the left side are
ZHOU, 2004)) all the diseases that people have done GWAS
- mobile elements studies for. Across the top are all the possible cell
- less than 2% of the histone modifications may have types and transcription factors (proteins that
something to do with function control how genes are activated) in the ENCODE
- nematode Caenorhabditis elegans has 20,517 study. Are there hotspots? Are there SNPs that
protein-coding genes correspond to both? Yes. Lots, and many of them
- misconceptions in common objections to junk DNA are new.
o a lack of knowledge of he original and- Take Crohns disease, a type of bowel disorder.
correct sense of the term The team found five SNPs that increase the risk
o the belief that evolution can always of Crohns, and that are recognised by a group of
can rid of the nunfunctional DNA transcription factors called GATA2. That wasnt
o the belief that future potential something that the Crohns disease biologists
constitutes a function had on their radar, says Birney. Suddenly weve
- in the majority of known bacterial species, selection made an unbiased association between a disease
against excess genome is extremely efficient due to and a piece of basic biology. In other words, its
the enormous effective population sizes, and the a new lead to follow up on.
fact that replication time and, hence, generation - Were now working with lots of different disease
time are correlated with genome size biologists looking at their data sets, says Birney.
In some sense, ENCODE is working form the
BRENNER: differentiated between junk DNA and genome out, while GWAS studies are working
garbage DNA; the excess DNA in our genome is from disease in. Where they meet, there is
junk and it is there because it is harmless, as well as interest. So far, the team have identified 400
being useless, and because the molecular process such hotspots that are worth looking into. Of
generating extra DNA outpaces those getting rid of these, between 50 and 100 were predictable.
it; Some of the rest make intuitive sense. Others are
head-scratchers.
- indifferent DNA refers to DNA sites that are
functional, but show no evidence of selection EVIDENCE OF ABUNDANT PURIFYING
against point mutations; deletion of these sites,
however, is deleterious, and is subject to purifying SELECTION IN HUMANS FOR
selection RECENTLY ACQUIRED REGULATORTY
FUNCTIONS
PURPOSE IS THE ONLY THING EVOLUTION CANNOT
PROVIDE - although only 5% of the human genome is
conserved across mammals, a substantially
A SLIGHTLY DIFFERENT RESPONSE larger portion is biochemically active, raising the
question of whether the additional elements
TO TODAYS ENCODE HYPE
evolve neutrally or confer a lineage-specific - the boundaries for the deletions permitted
fitness advantage proximate regulatory elements nearby the
- mammalian conservation suggests that aprox. flanking genes to remain intact
5% of the human genome is conserved due to - the heterozygous mice appeared phenotypically
noncoding and regulatory roles, but more than normal
80% is transcribed, bound by a regulator, or - the homozygous deletion mice for both deletions
associated with chromatin states suggestive of were viable
regulatory functions - the deletions werent lethal in embryons because
- human constraint correlates with mammalian of approx. rate 1:2:1 (wild-type : heterozygous :
conservation, mRNA splice sites and regulatory mutant homozygous)
elements; similar selective pressures act in - phenotypic parameters measures in the
humans and across mammals homozygous deletion mice, compared with
- a substantial factor of human constraint lies controls:
outside mammalian-conserved regions o post-natal survival rates for 25 weeks
- regions that do not overlap with active ENCODE o measurable growth retardation
elements and inactive chromatin states show o clinical chemistry tests (general and
lower constraint than ancestral repeats, specific plasma parameters)
suggesting that they may provide a more o morphological abnormalities
accurate neutral reference than repeats that can o abnormal growth
have exapted functions o tissue degeneration
- mammalian conserved regions lacking ENCODE o organ mass was similar in both groups
activity show reduced human constraint relative of deletion mice and their wild-type
to active regions, suggesting recent loss in littermates
function and activity; - molecular level impact: only 2 out of the 108
- these also show higher primate divergence quantitative assays revealed detectable
relative to active regions, suggesting that some alterations in levels of expression: Prkacb
loss of constraint predates human-macaque reduced in the heart and Rpp30 reduced in the
divergence brain
- almost half of human constraint lies outside - in MMU3 desert, beta-galactosidase expression
mammalian-conserved regions, even though the
strength of human constraint is higher in
conserved elements
- protein-coding constraint occurs primarily in
conserved regions, whereas regulatory constraint
is primarily lineage-specific, as proposed during
mammalian radiation
- genome-wide association studies suggest that
85% of disease-associated variants are
noncoding, a fraction similar to the proportion of
human constraint that we estimate lies outside
protein-coding regions; this suggests that
mutations outside conserved elements play promoter
important roles in both human evolution and
disease

MEGABASE DELETIONS OF GENE


DESERTS RESULT IN VIABLE MICE
NUCLEOSOME
- the functional importance of the roughly 98% of - each of our cells (or more correctly, nearly all of
the mammalian genomes not corresponding to our cells) contain a copy of this genome,
protein coding sequences remains largely encoded in 3 billion bp of DNA
undetermined - a collection of repar enzymes corrects chemical
- some large-scale deletions of the non-coding changes inflicted on the strands by
DNA (=gene deserts) can be well tolerated by environmental insults
any organism - nucleosomes protect the delicate strands from
- although gene inactivation can sometimes fail to physical damage
result in detectable phenotype, this is usually
related to the removal of genes with redundancy
elsewhere in the genome
- deletion of a gene desert mapping to mouse
chromosome 3 and in chromosome 19 (with no
evidence of transcription) contain 1,243
human-mouse conserved non-coding elements
- the job of the
HUMAN GENOME PROJECT
(TO KNOW OURSELEVES)
genetic linkage map: based on careful analyses
of human inheritance patterns; it indicates for
each chromosome the whereabouts of genes or
other heritable markers, with distances
measured in centimorgans (=measure of
recombination frequency)
the closer 2 genes are on a single
chromosome the less likely they are to get split
up during genetic recombination
when they are close enough that the chances
of being separated are only 1/100 they are said
to be separated by a distance of 1 centimorgan

physical maps: distances between features are


measured not in genetic terms, but in real
physical units (base pairs)

nucleosome is - some regions of the genome resist cloning in


paradoxical, requiring it to perform 2 YACs and others are prone to rearrangements
opposite functions simultaneously: on - restriction enzyme: cleave dsDNA molecules at
one hand, nucleosome must be stable, specific recognition sites, usually 4 or 6
forming tight, sheltering structures that compact nucleotides long
the DNA and keep it from harm; on the other - when digested with a particular restriction
hand, nucleosome must be labile enough to allow enzyme, then, identical segments of human DNA
the information in the DNA to be used; yield identical sets of restriction fragments; on
polymerases must be allowed access to the DNA, the other hand, DNA from the same genomic
both to transcribe mRNA for building new region of 2 different people, with their subtly
proteins and to replicate the DNA when the cell different genomic sequences, can yield dissimilar
divides; the method by which nucleosomes solve sets of fragments, which then produce different
these opposed needs is not well understood, but patterns when sorted according to size
may involve a partial unfolding of the DNA from - a third necessary tool is some means of DNA
around the nucleosome, one loop at a time, as amplification. The classic example is the cloning
the information in the DNA is read vector, which may be circular DNA mole- cules derived
from bacteria or from bacteriophages (viruslike
- nucleosomes also modify the activity of the parasites of bacteria), or artificial chro-
genes that they store: each nucleosome is
composed of 8 histone proteins bundled tightly
together at the centre, encircled by 2 loops of - mosomes constructed from yeast or bacterial genomic
DNA; the histone proteins, however, are not DNA. The characteristic all these vectors share is that
completely globular like most other proteins fragments of foreign DNA can be inserted into them,
whereby the inserted DNA is replicated along with the
they have long tails, which comprises nearly a rest of the vector as the host reproduces itself. A yeast
quarter of their length; the tails extend outward artificial chromo- some, or YAC, for instance, is
from the compact nucleosome, reaching out to constructed by assembling the essential functional
neighbouring nucleosomes and binding them parts of a nat- ural yeast chromosomeDNA
tightly together; the nucleus contains regulatory sequences that initiate replication, sequences that
enzymes that chemically modifies these tails to mark the ends of the chromosomes, and sequences
weaken their interactions; in this way, the cell required for chromosome separa-
makes particular genes more accessible to
polymerases, allowing their particular - tion during cell divisionthen splicing in a frag- ment
information to be copied and used to build new of human DNA. This engineered chromo- some is then
proteins reinserted into a yeast cell, which reproduces the YAC
during cell division, as if it were part of the yeasts
- the histone proteins are perfectly designed for normal complement of chromosomes. The result is a
their jobs, so much so that histones are nearly colony of yeast cells, each containing a copy, or clone,
identical in all non-bacterial organisms; even of the same fragment of human DNA
slight modifications can be lethal;
- the surface of the octamer is decorated with
positively charged AA, that interact strongly with
the negatively-charged phosphate groups of the
DNA; this serves to glue the DNA strand to the
protein core;
- the human genome is not so very
different from that of chimpanzees or
mice, and it even shares many common
elements with the genome of lowly fruit
fly

(PRIMER ON MOLECULAR GENETICS)


- each chromosome is a physically separated
molecule of DNA that ranges in length from
about 50 million to 250 million bp
- the average gene is 3,000 bp
- largest known human gene is dystrophin
2.4 mil. Bp
- repeat sequences are thought to have no
direct functions, but they shed light on
chromosome structure and dynamics
- chromosome 1 has the most genes (3,168)
and Y chromosome has the fewest (344)
- single nucleotide polymorphism (SNP) are
sites in a genome where individuals differ in
their DNA sequence, often by a single
base ; scientists believe that human
genome has at least 10 mil SNPs

You might also like