- instead of the expected 100,000 genes, the initial analysis found about 35,000 and that number has since been whittled down to about 21,000 - 80% of the human genome serves some purpose, biochemically speaking: - specify landing spots for proteins that influence gene activity - strands of RNA with myriad roles - places where chemical modifications serve to functional regions or because no mutation in silence stretches of our chromosomes these regions can be deleterious - a genes regulation is far more complex than - problems in ENCODE logic: previously thought, being influenced by multiple o seldom used causal role definition of biological stretches of regulatory DNA located both near function and then applying it inconsistently to and far from the gene itself and by strands of different biochemical properties RNA not translated into proteins (=noncoding o logical fallacy affirming the consequent RNA) o failing to appreciate the crucial difference - 11,224 DNA stretches are classified as between junk DNA and garbage DNA pseudogenes, dead genes now known to be o using analytical methods that yield biased errors active in some cell types or individuals and inflate estimates of functionality - there are many genes out there in which DNA o favouring statistical sensitivity over specificity codes for RNA, not a protein, as the end product o emphasizing statistical significance rather than - various cell genes home in on different cell the magnitude of the effect compartments, as if they have fixed addresses - in biology, there are 2 main concepts of function: where they operate: some go to the nucleus, o selected effect: function of a trait is some to the nucleolus and some to the cytoplasm the effect for which it was selected, or by which it is maintained GINGERAS: the fundamental unit of the genome o causal role: historical and non- and the basic unit of heredity should be the evolutionary: for a trait Q to have a transcript the piece of RNA decoded from DNA- causal role function, G, it is and not the gene necessary and sufficient that Q performs G - 5% of the human genome is conserved across mammals (ex:) TATAAA maintained by natural selection to - DNAs bases function in gene regulation through bind a transcription factor; a mutated sequence, their interactions with transcription factors and resembling this one, also binds the transcription other proteins; factor, but does not result in transcription (no - 8% of the genome falls within a transcription adaptive or maladaptive consequence); hence, the factor binding site, a percentage that is expected second sequence has no selected effect function, to double once more transcription factors have but its causal role function is to bind a transcription been tested factor ON THE IMMORALITY OF TELEVISION - from an evolutionary viewpoint, a function can be SETS: FUNCTION IN THE HUMAN assigned to a DNA sequence if and only if it is GENOME ACCORDING TO THE possible to destroy it; unless a genomic functionality is actively protected by selection, it will EVOLUTION-FREE GOSPEL OF accumulate deleterious mutations and will cease to ENCODE (Dan Graur) be functional - the fact that sometimes it is difficult to identify - less than 10% of the genome is evolutionarily selection should never be used as a justification to conserved through purifying selection ignore selection altogether in assigning functionality - according to ENCODE, a biological function can be to parts of the human genome maintained indefinitely without selection, which - the surest indicator of the existence of a genomic implies that at least 80-10=70% of the genome is function is that losing it has some phenotypic perfectly invulnerable to deleterious mutations, consequence for the organism either because no mutation can ever occur in these - functional regions of the genome should evolve more slowly and be more conserved among species than non-functional ones ENCODE: THE ROUGH GUIDE TO THE - Ward and Kellis confirmed that approx.. 5% of the HUMAN GENOME genome is interspecifically conserved and an additional 4% of the human genome in under - For the last decade, geneticists have run a selection seemingly endless stream of genomewide - According to ENCODE: associationstudies (GWAS), attempting to o 74.7% of the genome is transcribed understand the genetic basis of disease. They o 56.1% is associated with modified have thrown up a long list of SNPs variants at histones specific DNA lettersthat correlate with the risk o 15.2% is found in open-chromatin of different conditions. areas - The ENCODE team have mappedallof these to o 8.5% binds transcription factors their data. They found that just 12 percent of the o 4.6% consists of methylated CpG SNPs lie within protein-coding areas. They also dinucleotides showed that compared to random SNPs, the - transcription is fundamentally a stochastic process disease-associated ones are 60 percent more - classes of sequences that are known to be likely to lie within functional, non-coding regions, abundantly transcribed, but are typically devoid of especially in promoters and enhancers. This function: suggests that many of these variants are - pseudogenes (up to 1/10 transcribed; lack coding controlling the activity of different genes, and potential due to the presence of disruptive provides many fresh leads for understanding how mutations, evolve very rapidly and are mostly they affect our risk of disease. It was one of subject to no functional constraint), those too good to be true moments, says Birney. - introns (some human introns harbour regulatory Literally, I was in the room [when they got the sequences TISHKOFF, 2006 as well as sequences result] and I went: Yes! that produce small RNA molecules (HIROSE, 2003, - Imagine a massive table. Down the left side are ZHOU, 2004)) all the diseases that people have done GWAS - mobile elements studies for. Across the top are all the possible cell - less than 2% of the histone modifications may have types and transcription factors (proteins that something to do with function control how genes are activated) in the ENCODE - nematode Caenorhabditis elegans has 20,517 study. Are there hotspots? Are there SNPs that protein-coding genes correspond to both? Yes. Lots, and many of them - misconceptions in common objections to junk DNA are new. o a lack of knowledge of he original and- Take Crohns disease, a type of bowel disorder. correct sense of the term The team found five SNPs that increase the risk o the belief that evolution can always of Crohns, and that are recognised by a group of can rid of the nunfunctional DNA transcription factors called GATA2. That wasnt o the belief that future potential something that the Crohns disease biologists constitutes a function had on their radar, says Birney. Suddenly weve - in the majority of known bacterial species, selection made an unbiased association between a disease against excess genome is extremely efficient due to and a piece of basic biology. In other words, its the enormous effective population sizes, and the a new lead to follow up on. fact that replication time and, hence, generation - Were now working with lots of different disease time are correlated with genome size biologists looking at their data sets, says Birney. In some sense, ENCODE is working form the BRENNER: differentiated between junk DNA and genome out, while GWAS studies are working garbage DNA; the excess DNA in our genome is from disease in. Where they meet, there is junk and it is there because it is harmless, as well as interest. So far, the team have identified 400 being useless, and because the molecular process such hotspots that are worth looking into. Of generating extra DNA outpaces those getting rid of these, between 50 and 100 were predictable. it; Some of the rest make intuitive sense. Others are head-scratchers. - indifferent DNA refers to DNA sites that are functional, but show no evidence of selection EVIDENCE OF ABUNDANT PURIFYING against point mutations; deletion of these sites, however, is deleterious, and is subject to purifying SELECTION IN HUMANS FOR selection RECENTLY ACQUIRED REGULATORTY FUNCTIONS PURPOSE IS THE ONLY THING EVOLUTION CANNOT PROVIDE - although only 5% of the human genome is conserved across mammals, a substantially A SLIGHTLY DIFFERENT RESPONSE larger portion is biochemically active, raising the question of whether the additional elements TO TODAYS ENCODE HYPE evolve neutrally or confer a lineage-specific - the boundaries for the deletions permitted fitness advantage proximate regulatory elements nearby the - mammalian conservation suggests that aprox. flanking genes to remain intact 5% of the human genome is conserved due to - the heterozygous mice appeared phenotypically noncoding and regulatory roles, but more than normal 80% is transcribed, bound by a regulator, or - the homozygous deletion mice for both deletions associated with chromatin states suggestive of were viable regulatory functions - the deletions werent lethal in embryons because - human constraint correlates with mammalian of approx. rate 1:2:1 (wild-type : heterozygous : conservation, mRNA splice sites and regulatory mutant homozygous) elements; similar selective pressures act in - phenotypic parameters measures in the humans and across mammals homozygous deletion mice, compared with - a substantial factor of human constraint lies controls: outside mammalian-conserved regions o post-natal survival rates for 25 weeks - regions that do not overlap with active ENCODE o measurable growth retardation elements and inactive chromatin states show o clinical chemistry tests (general and lower constraint than ancestral repeats, specific plasma parameters) suggesting that they may provide a more o morphological abnormalities accurate neutral reference than repeats that can o abnormal growth have exapted functions o tissue degeneration - mammalian conserved regions lacking ENCODE o organ mass was similar in both groups activity show reduced human constraint relative of deletion mice and their wild-type to active regions, suggesting recent loss in littermates function and activity; - molecular level impact: only 2 out of the 108 - these also show higher primate divergence quantitative assays revealed detectable relative to active regions, suggesting that some alterations in levels of expression: Prkacb loss of constraint predates human-macaque reduced in the heart and Rpp30 reduced in the divergence brain - almost half of human constraint lies outside - in MMU3 desert, beta-galactosidase expression mammalian-conserved regions, even though the strength of human constraint is higher in conserved elements - protein-coding constraint occurs primarily in conserved regions, whereas regulatory constraint is primarily lineage-specific, as proposed during mammalian radiation - genome-wide association studies suggest that 85% of disease-associated variants are noncoding, a fraction similar to the proportion of human constraint that we estimate lies outside protein-coding regions; this suggests that mutations outside conserved elements play promoter important roles in both human evolution and disease
MEGABASE DELETIONS OF GENE
DESERTS RESULT IN VIABLE MICE NUCLEOSOME - the functional importance of the roughly 98% of - each of our cells (or more correctly, nearly all of the mammalian genomes not corresponding to our cells) contain a copy of this genome, protein coding sequences remains largely encoded in 3 billion bp of DNA undetermined - a collection of repar enzymes corrects chemical - some large-scale deletions of the non-coding changes inflicted on the strands by DNA (=gene deserts) can be well tolerated by environmental insults any organism - nucleosomes protect the delicate strands from - although gene inactivation can sometimes fail to physical damage result in detectable phenotype, this is usually related to the removal of genes with redundancy elsewhere in the genome - deletion of a gene desert mapping to mouse chromosome 3 and in chromosome 19 (with no evidence of transcription) contain 1,243 human-mouse conserved non-coding elements - the job of the HUMAN GENOME PROJECT (TO KNOW OURSELEVES) genetic linkage map: based on careful analyses of human inheritance patterns; it indicates for each chromosome the whereabouts of genes or other heritable markers, with distances measured in centimorgans (=measure of recombination frequency) the closer 2 genes are on a single chromosome the less likely they are to get split up during genetic recombination when they are close enough that the chances of being separated are only 1/100 they are said to be separated by a distance of 1 centimorgan
physical maps: distances between features are
measured not in genetic terms, but in real physical units (base pairs)
nucleosome is - some regions of the genome resist cloning in
paradoxical, requiring it to perform 2 YACs and others are prone to rearrangements opposite functions simultaneously: on - restriction enzyme: cleave dsDNA molecules at one hand, nucleosome must be stable, specific recognition sites, usually 4 or 6 forming tight, sheltering structures that compact nucleotides long the DNA and keep it from harm; on the other - when digested with a particular restriction hand, nucleosome must be labile enough to allow enzyme, then, identical segments of human DNA the information in the DNA to be used; yield identical sets of restriction fragments; on polymerases must be allowed access to the DNA, the other hand, DNA from the same genomic both to transcribe mRNA for building new region of 2 different people, with their subtly proteins and to replicate the DNA when the cell different genomic sequences, can yield dissimilar divides; the method by which nucleosomes solve sets of fragments, which then produce different these opposed needs is not well understood, but patterns when sorted according to size may involve a partial unfolding of the DNA from - a third necessary tool is some means of DNA around the nucleosome, one loop at a time, as amplification. The classic example is the cloning the information in the DNA is read vector, which may be circular DNA mole- cules derived from bacteria or from bacteriophages (viruslike - nucleosomes also modify the activity of the parasites of bacteria), or artificial chro- genes that they store: each nucleosome is composed of 8 histone proteins bundled tightly together at the centre, encircled by 2 loops of - mosomes constructed from yeast or bacterial genomic DNA; the histone proteins, however, are not DNA. The characteristic all these vectors share is that completely globular like most other proteins fragments of foreign DNA can be inserted into them, whereby the inserted DNA is replicated along with the they have long tails, which comprises nearly a rest of the vector as the host reproduces itself. A yeast quarter of their length; the tails extend outward artificial chromo- some, or YAC, for instance, is from the compact nucleosome, reaching out to constructed by assembling the essential functional neighbouring nucleosomes and binding them parts of a nat- ural yeast chromosomeDNA tightly together; the nucleus contains regulatory sequences that initiate replication, sequences that enzymes that chemically modifies these tails to mark the ends of the chromosomes, and sequences weaken their interactions; in this way, the cell required for chromosome separa- makes particular genes more accessible to polymerases, allowing their particular - tion during cell divisionthen splicing in a frag- ment information to be copied and used to build new of human DNA. This engineered chromo- some is then proteins reinserted into a yeast cell, which reproduces the YAC during cell division, as if it were part of the yeasts - the histone proteins are perfectly designed for normal complement of chromosomes. The result is a their jobs, so much so that histones are nearly colony of yeast cells, each containing a copy, or clone, identical in all non-bacterial organisms; even of the same fragment of human DNA slight modifications can be lethal; - the surface of the octamer is decorated with positively charged AA, that interact strongly with the negatively-charged phosphate groups of the DNA; this serves to glue the DNA strand to the protein core; - the human genome is not so very different from that of chimpanzees or mice, and it even shares many common elements with the genome of lowly fruit fly
(PRIMER ON MOLECULAR GENETICS)
- each chromosome is a physically separated molecule of DNA that ranges in length from about 50 million to 250 million bp - the average gene is 3,000 bp - largest known human gene is dystrophin 2.4 mil. Bp - repeat sequences are thought to have no direct functions, but they shed light on chromosome structure and dynamics - chromosome 1 has the most genes (3,168) and Y chromosome has the fewest (344) - single nucleotide polymorphism (SNP) are sites in a genome where individuals differ in their DNA sequence, often by a single base ; scientists believe that human genome has at least 10 mil SNPs