You are on page 1of 51

Computational Molecular Biology

Biochem 218 BioMedical Informatics 231


http://biochem218.stanford.edu/

Genomics and Bioinformatics

Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy)

Faculty, TAs and Staff


Doug Brutlag

Lee Kozar

Maeve OHuallachain

Dan Davison

Course and Video Availability


Alway M114
Tuesdays & Thursdays 2:15-3:30 PM

Course Web Site


http://biochem218.stanford.edu/

Stanford Center for Professional Development


http://scpd.stanford.edu/

Videos available 24 hours/day, 7 days/week Course offered Autumn, Winter and Spring quarters

Course Requirements
Lectures
Theoretical background of current methods Strengths and weaknesses of current approaches Future directions for improvements

Demonstrations
Applications (Mac, PC, Unix, Web) Web applications Illustrate homework

All homework and questions must be submitted by email to homework218@cmgm.stanford.edu Several homework assignments (35%)
Due one week after assigned

Final project (Due March 12th)


A critical or comparative review of computational approaches to any problem in computational molecular biology Propose new approach Implement a new approach Examples of previous projects for the class can be found at http://biochem218.stanford.edu/Projects.html

David Mount
Bioinformatics: Sequence and Genome Analysis 2nd Edition

Jin Xiong Essential Bioinformatics

Richard Durbin et al. Biological Sequence Analysis

Jones & Pevzner Bioinformatics Algorithms

Dan Guseld
Algorithms on Strings, Trees & Sequences

Baldi & Brunak

Bioinformatics: The Machine Learning Approach

Higgins & Taylor


Bioinformatics: Sequence, Structure & Databanks

NCBI Handbook
http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=handbook

NCBI Handbook
http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=handbook

EMBL-EBI Home Page


http://www.ebi.ac.uk/

Berg, Tymoczko & Stryer


Biochemistry, Fifth Edition

Benjamin Lewin Genes IX

Genomics, Bioinformatics & Computational Biology


Genomics Structural Genomics Bioinformatics Proteomics

Computational Molecular Biology Computational Biology

Genomics, Bioinformatics & Computational Biology


Genomics Bioinformatics Systems Biology Structural Genomics Proteomics

Computational Molecular Biology Computational Biology

Genomics, Bioinformatics & Computational Biology


Genomics Structural Genomics Bioinformatics Proteomics

Computational Molecular Biology Computational Biology

Machine Learning Articial Intelligence

Robotics Databases Information Theory Graph Theory

Statistics & Probability

Algorithms

What is Bioinformatics?
Individuals RNA Protein

DNA

Phenotype

Evolution

Selection

Populations

Biological Information

Computational Goals of Bioinformatics


Learn & Generalize: Discover conserved patterns (models) of sequences, structures, interactions, metabolism & chemistries from well-studied examples. Prediction: Infer function or structure of newly sequenced genes, genomes, proteins or proteomes from these generalizations. Organize & Integrate: Develop a systematic and genomic approach to molecular interactions, metabolism, cell signaling, gene expression Simulate: Model gene expression, gene regulation, protein folding, protein-protein interaction, protein-ligand binding, catalytic function, metabolism Engineer: Construct novel organisms or novel functions or novel regulation of genes and proteins. Gene Therapy: Target specic genes, or mutations, RNAi to change a disease phenotype.

Central Paradigm of Molecular Biology

DNA

RNA

Protein

Phenotype (Symptoms)

Molecular Biology of the Gene 1965

Central Paradigm of Bioinformatics


Genetic Information
MVHLTPEEKT AVNALWGKVN VDAVGGEALG RLLVVYPWTQ RFFESFGDLS SPDAVMGNPK VKAHGKKVLG AFSDGLAHLD NLKGTFSQLS ELHCDKLHVD PENFRLLGNV LVCVLARNFG KEFTPQMQAA YQKVVAGVAN ALAHKYH

Molecular Structure

Biochemical Function

Phenotype (Symptoms)

Central Paradigm of Bioinformatics


Genetic Information
MVHLTPEEKT AVNALWGKVN VDAVGGEALG RLLVVYPWTQ RFFESFGDLS SPDAVMGNPK VKAHGKKVLG AFSDGLAHLD NLKGTFSQLS ELHCDKLHVD PENFRLLGNV LVCVLARNFG KEFTPQMQAA YQKVVAGVAN ALAHKYH

Molecular Structure

Biochemical Function

Phenotype (Symptoms)

Challenges Understanding Genetic Information


Genetic Information Molecular Structure Biochemical Function Phenotype

Genetic information is redundant Structural information is redundant Genes and proteins are meta-stable Single genes have multiple functions Genes are one dimensional but function depends on three-dimensional structure

Redundancy in Genomic & Protein Sequences


DNA is double-stranded Genetic code Acceptable amino-acid replacements Intron-exon variation Alternative splicing Strain variations (SNPs) Sequencing errors

Using A Controlled Vocabulary for Literature Search http://www.ncbi.nlm.nih.gov/sites/entrez?db=mesh

Gene Ontology Database


http://www.geneontology.org/

UCSC Genome Browser


http://genome.ucsc.edu/

ExPASy Proteomics Server


http://www.expasy.ch/doc.html

Inferring Biological Function from Protein Sequence


Consensus Sequences or Sequence Motifs
Zinc Finger (C2H2 type) C x {2,4} C x {12} H x {3,5} H

Sequences of Common Structure or Function

Sequence Similarity
10 20 30 40 50 Query VLSPADKTNVKAAWGKVGAHAGEVGAEALERMFLSFPTTKTYFPHF------DLSHGS |:| :|: | |:|||| | |:||| |: : :|:| :| | |: | Match HLTPEEKSAVTALWGKV--NVDEYGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN 10 20 30 40 50

A Typical Motif: Zinc Finger DNA Binding Motif

C..C............H....H

Inferring Biological Function from Protein Sequence


Weight Matrices or 1 2 3 4 5 6 7 8 9 10 11 12 Position-Specic Scoring Matrices
A R N D C Q E G H I L K M F P S T W Y V 2 1 3 13 10 12 67 4 13 9 1 2 7 5 8 9 4 0 1 16 7 0 1 0 0 8 0 1 0 0 0 2 1 1 10 0 0 1 0 1 13 0 0 12 1 0 4 0 0 0 1 0 0 0 0 0 0 2 2 1 1 1 21 8 10 0 0 7 6 0 0 2 2 0 0 9 21 0 0 15 7 3 3 0 9 7 1 4 0 0 8 0 0 0 46 0 4 3 1 1 2 0 0 2 2 0 5 0 10 0 11 1 2 10 0 4 9 3 0 16 16 1 17 0 1 31 0 3 11 24 0 14 3 4 5 10 11 1 1 13 10 0 5 2 7 1 1 0 0 0 0 0 5 7 1 8 4 0 3 0 0 4 0 0 0 10 0 0 0 6 0 1 0 0 0 0 0 0 0 0 1 17 0 8 3 1 3 0 2 2 2 0 5 22 3 11 1 5 0 2 2 2 0 5 2 0 0 0 0 0 0 0 0 1 0 1 1 0 4 2 0 1 0 0 2 4 0 1 6 3 1 1 2 15 0 0 2 12 0 28

Consensus Sequences or Sequence Motifs


Zinc Finger (C2H2 type) C x {2,4} C x {12} H x {3,5} H

Proles, PSI-BLAST Sequences of Common Hidden Markov Models


Structure or Function
I1 D2 D3 D4 D5

I2

I3

I4

I5

AA1

AA2

AA3

AA4

AA5

AA6

Sequence Similarity
10 20 30 40 50 Query VLSPADKTNVKAAWGKVGAHAGEVGAEALERMFLSFPTTKTYFPHF------DLSHGS |:| :|: | |:|||| | |:||| |: : :|:| :| | |: | Match HLTPEEKSAVTALWGKV--NVDEYGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN 10 20 30 40 50

Buried Treasure

Buried Treasure

Buried Treasure

Clustal Globin Alignment

Consensus Sequence From a Multiple Sequence Alignment


ClustalW Insulin Alignments
10 20 30

IPGP IPDK IPDG IPCH IPCA IPBO IPAF

M A L WM R L L P L L A L L A L W A P A P T R A M A L W I R S L P L L A L L V F S G P G - T S Y M A V W I Q A G A L L F L L A V S S V N A N A G M A A L WL Q S F S L L V L L V V S W P G S Q A V A . W . . L L L L
40 50

F A F A A F A A

V A V A P V P

S N N N N A N

R Q Q Q Q Q Q Q

H H H H H H H H
60

IPGP IPDK IPDG IPCH IPCA IPBO IPAF

L L L L L L L L

C C C C C C C C

G G G G G G G G

S S S S S S S S

N H H H H H H H

L L L L L L L L

V V V V V V V V

E E E E D E D E

T A A A A A A A

L L L L L L L L
70

Y Y Y Y Y Y Y Y

S L L L L L L L

V V V V V V V V

C C C C C C C C

Q G G G G G G G

D E E E P E D E

D R R R T R R R

G G G G G G G G

F F F F F F F F

F F F F F F F F
80

Y Y Y Y Y Y Y Y

I S T S N T N .

P P P P P P P P

K K K K K K K K

D T A A R A R .

X X R R D R D

X X R R V R V

E D E D D E D D

L V V V P V Q V

E E E E P E L E
90

IPGP IPDK IPDG IPCH IPCA IPBO IPAF

D Q D Q L G L

P P L P G P G P

Q Q F Q F

V L V L L V L L

E V R V P G P

Q N D S P A P

T G V S K L K

E P E P S E S

L L L L L G L

G H A R A G

M G G G G A G

G E A E G A

A G

Q A

E D

L V P A T P N

G G G G E G E G

A E E V V A V

G L G L A G A

G P G P D G E

L F L F F L F F

Q Q Q Q A E A Q

P P F F

L K K

A D D

L L H Q

Q H E Q A M

G E G E E G E E

100

110

120

IPGP IPDK IPDG IPCH IPCA IPBO IPAF

A E A E V P M

L Y L Y I P M .

Q Q Q E R Q V Q

X X K K K K K K

X X R V R R R R

K -

R -

G G G G G G G G

I I I I I I I I

V V V V V V V V

D E E E E E E E

Q Q Q Q Q Q Q Q

C C C C C C C C

C C C C C C C C

T E T H H A H

G N S N K S R

T P I T P V P

C C C C C C C C

T S S S S S N S

R L L L I L I L

H Y Y Y F Y F Y

Q Q Q Q E Q D Q

L L L L L L L L

Q E E E Q E Q E

S N N N N N N N

Y Y Y Y Y Y Y Y

C C C C C C C C

N N N N N N N N

HMM Model of Hemoglobins


http://decypher.stanford.edu/

GrowTree VegF Neighbor Joining Tree

Human Gene Expression Signatures


T Cells Signaling

DNA Damage Fibroblast Stimulation B Cells Signaling CMV Infection Anoxia Polio Infection Monocytes Signaling IL4 Hormone

Clustering Gene Expression Proles: Comparison of Methods

D'haeseleer P (2005). Nat Biotechnol. 23,1499-501.

TAMO: Tools for the Analysis of Motifs

Finding Transcription Factor Binding Sites

Upstream Regions expressed

CoPho 5

Genes

GATGGCTGCACCACGTGTATGC...ACGATGTCTCGC Pho 8 CACATCGCATCACGTGACCAGT...GACATGGACGGC Pho 81 GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA Pho 84 TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG Pho CGCTAGCCCACGTGGATCTTGA...AGAATGACTGGC


Transcription Start

Finding Transcription Factor Binding Sites

Upstream Regions

Co-expressed Genes

GATGGCTGCACCACGTGTATGC...ACGATGTCTCGC CACATCGCATCACGTGACCAGT...GACATGGACGGC GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG CGCTAGCCCACGTGGATCTTGT...AGAATGGCCTAT

Finding Transcription Factor Binding Sites

Upstream Regions

Co-expressed Genes

ATGGCTGCACCACGTTTATGC...ACGATGTCTCGC CACATCGCATCACGTGACCAGT...GACATGGACGGC GCCTCGCACGTGGTGGTACAGT...AACATGACTA TTAGGACCATCACGTGA...ACAATGAGAGCG CGCTAGCCCACGTTGATCTTGT...AGAATGGCCTAT Pho4 binding

Metabolic Networks: BioCyc


http://biocyc.org/

C. crescentus Cell Cycle Gene Expression

Genome Wide Associations in Rheumatoid Arthritis

Pearson, T. A. et al. JAMA 2008;299:1335-1344

Leveraging Genomic Information in Medicine


Novel Diagnostics

Microchips & Microarrays - DNA Gene Expression - RNA Proteomics - Protein Novel Therapeutics
Drug Target Discovery Rational Drug Design Molecular Docking Gene Therapy Stem Cell Therapy

Understanding Metabolism Understanding Disease


Inherited Diseases - OMIM Infectious Diseases Pathogenic Bacteria Viruses

You might also like