Professional Documents
Culture Documents
17.2 RNAs in Human Genome (Sam Griffiths-Jones) 24.2 Population Genetics of the Human Genome (Gil McVean ) 2.3 Association Mapping and the Human Genome (Lon Cardon) 9.3 The Human Genome and Human Evolution (Chris Tyler-Smith)
History
Our interests.
Sequencing Strategies
Public effort- strategy:
Celera - strategy:
From Myers 99
Unfair competition: IC delivering the same goods but with state funding.
Unfair competition: Celera delivering the same goods but can use IC data, while IC cannot use Celera data.
2000
2001 2002
Arabidopsis thaliana
Human Genome Mouse Genome
The Genome OnLine Database knows of 958 genome sequencing projects, of which 169 are completed
Gb Gb Gb Gb Gb Gb Gb
Birds Chicken
1.2 Gb
1.7 Gb
Nematodes Caenorhabdites elegans 100 Mb Caenorhabdites briggsae 80 Mb Sea Urchin Strongylocentrotus purpuratus
0.4 Gb 1.9 Gb
800 Mb
Multicellular Plants
165 270 780 278 Mb Mb Mb Mb Arabidopsis thaliana Rice 125 Mb 430 Mb
1 2 3 4 5 6 7
10
11
12
13 14
16 15
17
18
19 20
X 21
22
mitochondria
Y .016
104 118 107 100 148 143 142 176 163 148 140 197 198
72 88 86
66
45 48 163
51
3.2*109 bp
279
221
251
a globin
Myoglobin
*5.000
6*104 bp
(chromosome 11)
b-globin
Exon 3
3 flanking
Exon 1 Exon 2
5 flanking
*20
3*103 bp
*103
30 bp
DNA: Protein:
ATTGCCATGTCGATAATTGGACTATTTGGA
aa aa aa aa aa aa aa aa aa aa
Highly conserved - coding Highly conserved - other Transposon based repeats Heterochromatin Other non-conserved
Nuclear Genome 1.5% 3.5% 45 % 6.6% 44 % Mendelian inheritance 1 (typically) Recombination 1/130 kb
Gene Density:
Pseudogenes:
20000
Processed Pseudogenes
a-globins (7), growth hormone (5), Class I HLA heavy chain (20),.
Dispersed Pyruvate dehydrogenase (2), Aldolase (5), PAX (>12),.. Clustered and Dispersed HOX (38 4), Histones (61 2), Olfactory receptors (>900 25),
Transposons
Smallest: 1
Smallest: 10s of bp
Simple Eukaryotic
Alternative Splicing
1. A challenge to automated annotation. 2. How widespread is it? 3. Is it always functional? 4. How does it evolve?
Cartegni,L. et al.(2002) Listening to Silence and understanding nonsense: Exonic mutations that affect splicing Nature Revi ews Genetics 3.4.285HMG p291-294
small nucleolar, over 100 types - RNA modification and processing small nuclear - involved in splicing very small ~22bp , regulation large cytosolic subunit small mitochondrial subunit large mitochondrial subunit transfer RNA > 1500 types
Genome Annotation
Proteins
Genomes ESTs
http://genome.ucsc.edu/
A. Make gene characteristics to each nucleotide. Extract legal prediction by dynamical programming. B. Use HMM to describe biological knowledge of gene structure.
Average Number of Mitoses Male generation (15:35 .. 20:150 Female generation: ~24
Single nucleotide substitutions: ~10-7 Microsatellites (~100.000): ~10-2 Small insertion deletions: ~10-8
Crow,JF (2000) The Origins, Patterns and Implications of Human Spontaneous Mutation Nature Review Genetics 1.1.40 -47 + Strachan and Read (2004) chapter 11 +Jobling, Hurles and TylerSmith (2004) chapter 2
Recombination
Recombination:
1 meiosis
Gene Conversion:
Lander et al.(2001) Initial sequencing and analysis of the human genome Nature 409.860-912. + Kong,E. et al.(2002) A high resolution recombination map of the human genome Nature Genetics
Population scenario
A A A C C
ThrPro ACGCCA
A A A A C C ArgSer AGGCCG
ThrSer ACGCCG
A A A C C
The selection criteria could in principle be anything, but the selection against amino acid changes is without comparison the most important.
Certain events have functional consequences and will be selected out. The strength and localization of this selection is of great interest.
Substitutions
Number
Percent
549
134 415 392 23
100
25 75 71 4
Examples of rates
Organism RNA Virus Influenza A Hemagglutinin 13.1 10-3 Gene Syno/year
Non-Syno/Year
3.6 10-3
Hepatitis C
HIV 1
E
gag
6.9 10-3
2.8 10-3
0.3 10-3
1.7 10-3
DNA virus Hepatitis B Herpes Simplex P Genome 4.6 10-5 3.5 10-8 1.5 10-5
Nuclear Genes
Mammals Mammals Mammals c-mos a-globin histone 3 5.2 10-9 3.9 10-9 6.2 10-9 0.9 10-9 0.6 10-9 0.0
Genealogical Structures
Homology:
The existence of a common ancestor (for instance for 2 sequences)
ccagtcg
cagtct
ccggtcg
Phylogeny
Only finding common ancestors. Only one ancestor.
Pedigree:
Populations
Grand parents
Parents
Now
Africa
Non-Africa
Inter.SNP Consortium (2001): A map of human genome sequence variation containing 1.42 million SNPs. Nature 409.928-33
Pedigrees
Chinese
http://demography.anu.edu.au/People/Staff/zhongwei.html
Quebec French
Heyer and Tremblay, 1998 PNAS
Mormons
http://genealogy-mormons.com/
Icelandic
http://www.decode.com + Helgason, A. et al. (2003 June) A population-wide coalescent analysis of Icelandic matrilineal and patrilineal genealogies: Evidence for a faster evolutionary rate of mtDNA lineages than Y-chromosomes American Journal Human Genetics.
Total Pedigree
Helga son
1848 1892
1
2 1
Ancestor cohort
Year
2 2 1 2 1 2 1 3 1 1
1972
Contemporary cohort
2002
Matrilines
N = 31,817
77.9%
Patrilines
Ancestral cohort born 1848-1892 N = 31,659
73.9%
4 .3
3 .8
g=
N = 64,150
g=
N = 66,910
Genealogical Questions
Pedigrees
Time back to first individual common ancestor to everyone
ARG questions:
First Eukaryotes
First Chordates First Vertebrates First Mammals First Primates First Hominoids Chimp-Human Split
Hedges, SB (2002) The Origin and Evolution of Model Organisms Nature Review Genetics 3.11.838 -848.
3 Problems:
i. Test all possible relationships. ii. Examine unknown internal states. iii. Explore unknown paths between states at nodes.
Time Direction
ATTGCGTATATAT.CAG
ATTGCGTATATAT.CAG
ATTGCGTATATAT.CAG
observable
observable
observable
Protein Structure
C A
A C A U G U
Gene Structure
Observable
Unobservable
AGTGGTACCATTTAATGCG..... AGTGGTACTATTTAGTGCG.....
Pcoding{ATG-->GTG} or Pnon-coding{ATG-->GTG}
Simple Prokaryotic
Simple Eukaryotic
ACTCCT
HIV proteinase
8 2
Turnip
Gene Order/Orientation.
Sequences
Protein Structure
General Theme.
Linkage Mapping
D r M
From McVean
Dominant/Recessive.
2Ne generations
genotype
Genotype Phenotype
phenotype
BRCA2 example
1000 cases and 1000 controls typed at 8 microsatellite markers
Bayesian analysis
Causative SNPs.
The International HapMap Project Nature 426, 789 - 796 (18 Dec 2003) http://www.hapmap.org/
HapMap
HapMap
Ontologies
A Structured Vocabulary Consistent across species.
Purpose:
Facility communication among researchers Facility communication among computer systems
Cellular Component
http://www.geneontology.org
Gene Ontology Consortium (2001) Creating the Gene Ontology Resource: Design and Implementation. Genome Research 11.1425-33
Gene Ontology Consortium (2004) The Gene Ontology (GO) database and informatics resource Nucleic Acid Research 32.D258 -61.
Nucleic Acids
Carbohydrates
total
719 569
14 4
20645 3603
Exp. Tech.
Total
21948
994
1288
18
24248
http://www.strgen.org/
http://www.nysgrc.org/
http://www.oppf.ox.ac.uk/
http://pdb.ccdc.cam.ac.uk/pdb/strucgen.html
John Westbrook, Zukang Feng, Li Chen, Huanwang Yang and Helen M. Berman The Protein Data Bank and structural genomics Nucl eic Acids Research, 2003, Vol. 31, No. 1 489-491
http://www.strgen.org/status/mpoverview.html
Proteomics
2D PAGE gels (polyacryl gel electrophoresis )
MALDI
Protein Micro-arrays
http://www.hupo.org Hanash,S.(2003) Disease Proteomics Nature 422.226- Aebersold,R. and M.Mann (2003) Mass spectrometry-based proteomics Nature 422.198- Gavin et al. (2002) Functional Organisation of the Yeast Proteome by systematic analysis of protein complexes Nature 415.141-
Summary
The Genome
Minimal ARGs and Haplotype Blocks (Song) a: (3,4) b: (3,4) c: (15,16) d: (16,17) e: (35,36)
f: (35,36)
g: (36,37)
Contagious Dependence
263 cM 6.800
30kb
Key articles: Lander et al.(2001) Initial Sequencing and Analysis of the Human Genome Nature Venter et al.(2001)The Sequence of the Human Genome Science 291.1304-1351
References: www-pages.
Major sequencing centers:
Baylor College of Medicine Genome Sequencing Center Celera DoE Joint Genome Institute Genoscope TIGR Washington University Genome Sequencing Center Wellcome Trust Sanger Institute hgsc.bcm.tcm.edu/ www.celera.com www.jgi.doe.gov www.genoscope.cns.fr www.tigr.org www.genome.wustl.edu www.sanger.ac.uk
www.-genome.wi.mit.edu
www.ensembl.org www.ebi.ac.uk www.ncbi.nlm.nih.gov http://www.nature.com/genomics/human/ http://wit.integratedgenomics.com/GOLD/ http://www2.ebi.ac.uk/genomes/ http://sayer.lab.nig.jp/~silver/index.html http://www.ebi.ac.uk/proteome/ http://www.ncbi.nlm.nih.gov/ http://www.hapmap.org/ http://www.ncbi.nlm.nih.gov/omim/