You are on page 1of 111

Bioinformatics

Doug Brutlag, Professor Emeritus


of Biochemistry & Medicine (by courtesy)
Stanford University School of Medicine
Genomics& Medicine
htt!""biochem##$%stanford%edu"
&hat is Bioinformatics'
()* Protein
D)* Phenotye
Selection Evolution
+ndividuals
Poulations
Biological +nformation
,omutational -oals of Bioinformatics
.
/earn & -enerali0e! Discover conserved atterns (models) of
se1uences, structures, metabolism & chemistries from 2ell3studied
e4amles%
.
Prediction! +nfer function or structure of ne2ly se1uenced genes,
genomes, roteomes or roteins from these generali0ations%
.
5rgani0e & +ntegrate! Develo a systematic and genomic aroach to
molecular interactions, metabolism, cell signaling, gene e4ression6
.
Simulate! Model gene e4ression, gene regulation, rotein folding,
rotein3rotein interaction, rotein3ligand binding, catalytic function,
metabolism6
.
Engineer! ,onstruct novel organisms or novel functions or novel
regulation of genes and roteins%
.
7arget! Mutations, ()*i to seci8c genes and transcrits or drugs to
seci8c rotein targets%
,entral Paradigm of Molecular Biology
DNA RNA Protein Phenotype
,entral Paradigm of Medicine
Opinions DNA RNA Protein Symptoms
,entral Paradigm of Bioinformatics
-enetic
+nformation
Molecular
Structure
Phenotye
(Symtoms)
Biochemical
9unction
MVHLTPEEKT
AVNALWGKVN
VDAVGGEALG
RLLVVYPWTQ
RFFESFGDLS
SPDAVMGNPK
VKAHGKKVLG
AFSDGLAHLD
NLKGTFSQLS
ELHCDKLHVD
PENFRLLGNV
LVCVLARNFG
KEFTPQMQAA
YQKVVAGVAN
ALAHKYH
,entral Paradigm of Bioinformatics
Molecular
Structure
Phenotye
(Symtoms)
Biochemical
9unction
-enetic
+nformation
MVHLTPEEKT
AVNALWGKVN
VDAVGGEALG
RLLVVYPWTQ
RFFESFGDLS
SPDAVMGNPK
VKAHGKKVLG
AFSDGLAHLD
NLKGTFSQLS
ELHCDKLHVD
PENFRLLGNV
LVCVLARNFG
KEFTPQMQAA
YQKVVAGVAN
ALAHKYH
,hallenges Understanding -enetic +nformation
-enetic
+nformation
Molecular
Structure
Biochemical
9unction
Phenotye
.
-enetic information is redundant
.
Structural information is redundant
Soybean /eghemoglobin and
Serm &hale Myoglobin
Soybean /eghemoglobin Serm &hale Myoglobin
,hallenges Understanding -enetic +nformation
-enetic
+nformation
Molecular
Structure
Biochemical
9unction
Phenotye
.
-enetic information is redundant
.
Structural information is redundant
.
-enes and roteins are one dimensional but
their function deends on three3dimensional
structure
,hallenges Understanding -enetic +nformation
-enetic
+nformation
Molecular
Structure
Biochemical
9unction
Phenotye
.
-enetic information is redundant
.
Structural information is redundant
.
-enes and roteins are one dimensional but
their function deends on three3dimensional
structure
.
-enes and roteins are meta3stable
Discovering 9unction from Protein Se1uence
Sequences of
Common
Structure or Function
Sequence Similarity
10 20 30 40 50
Query VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS
|| | | ||||| | |||| | || | | | |
D!"!#!$e HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN
10 20 30 40 50
Dayhoff:s P*M ;<=
*mino *cid (elacement Matri4 (#>?$)
Discovering 9unction from Protein Se1uence
Sequences of
Common
Structure or Function
Sequence Similarity
10 20 30 40 50
Query VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS
|| | | ||||| | |||| | || | | | |
D!"!#!$e HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN
10 20 30 40 50
Discovering 9unction from Protein Se1uence
,onsensus Se1uences
or Se1uence Motifs
Zinc Finger (C22 type!
C "#2$%& C "#'2& "#($)&
Sequence Similarity
10 20 30 40 50
Query VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS
|| | | ||||| | |||| | || | | | |
D!"!#!$e HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN
10 20 30 40 50
Sequences of
Common
Structure or Function
Protein Motifs from
Multile Se1uence *lignments
EB+ ,ourse on Protein Motifs"Signatures
htt!""222%ebi%ac%u@"training"online"course"introduction3rotein3classi8cation3ebi
* 7yical Motif!
Ainc 9inger D)* Binding Motif
%&&%&&&&&&&&&&&&H&&&&H
Discovering 9unction from Protein Se1uence
,onsensus Se1uences
or Se1uence Motifs
Zinc Finger (C22 type!
C "#2$%& C "#'2& "#($)&
Sequences of
Common
Structure or Function
Sequence Similarity
10 20 30 40 50
Query VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS
|| | | ||||| | |||| | || | | | |
D!"!#!$e HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN
10 20 30 40 50
P'$("(') P'$("(')
1 2 3 4 5 * + , - 10 11 12 1 2 3 4 5 * + , - 10 11 12
A A 2 1 3 13 10 12 *+ 4 13 - 1 2 2 1 3 13 10 12 *+ 4 13 - 1 2
R R + 5 , - 4 0 1 1* + 0 1 0 + 5 , - 4 0 1 1* + 0 1 0
N N 0 , 0 1 0 0 0 2 1 1 10 0 0 , 0 1 0 0 0 2 1 1 10 0
D D 0 1 0 1 13 0 0 12 1 0 4 0 0 1 0 1 13 0 0 12 1 0 4 0
% % 0 0 1 0 0 0 0 0 0 2 2 1 0 0 1 0 0 0 0 0 0 2 2 1
Q Q 1 1 21 , 10 0 0 + * 0 0 2 1 1 21 , 10 0 0 + * 0 0 2
E E 2 0 0 - 21 0 0 15 + 3 3 0 2 0 0 - 21 0 0 15 + 3 3 0
G G - + 1 4 0 0 , 0 0 0 4* 0 - + 1 4 0 0 , 0 0 0 4* 0
H H 4 3 1 1 2 0 0 2 2 0 5 0 4 3 1 1 2 0 0 2 2 0 5 0
. . 10 0 11 1 2 10 0 4 - 3 0 1* 10 0 11 1 2 10 0 4 - 3 0 1*
L L 1* 1 1+ 0 1 31 0 3 11 24 0 14 1* 1 1+ 0 1 31 0 3 11 24 0 14
K K 3 4 5 10 11 1 1 13 10 0 5 2 3 4 5 10 11 1 1 13 10 0 5 2
M M + 1 1 0 0 0 0 0 5 + 1 , + 1 1 0 0 0 0 0 5 + 1 ,
F F 4 0 3 0 0 4 0 0 0 10 0 0 4 0 3 0 0 4 0 0 0 10 0 0
P P 0 * 0 1 0 0 0 0 0 0 0 0 0 * 0 1 0 0 0 0 0 0 0 0
S S 1 1+ 0 , 3 1 3 0 2 2 2 0 1 1+ 0 , 3 1 3 0 2 2 2 0
T T 5 22 3 11 1 5 0 2 2 2 0 5 5 22 3 11 1 5 0 2 2 2 0 5
W W 2 0 0 0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 0 0 0 1 0 1
Y Y 1 0 4 2 0 1 0 0 2 4 0 1 1 0 4 2 0 1 0 0 2 4 0 1
V V * 3 1 1 2 15 0 0 2 12 0 2, * 3 1 1 2 15 0 0 2 12 0 2,
PSSMs or &eight Matrices
Discovering 9unction from Protein Se1uence
,onsensus Se1uences
or Se1uence Motifs
Zinc Finger (C22 type!
C "#2$%& C "#'2& "#($)&
Sequences of
Common
Structure or Function
Sequence Similarity
10 20 30 40 50
Query VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS
|| | | ||||| | |||| | || | | | |
D!"!#!$e HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN
10 20 30 40 50
Protein Motifs from
Multile Se1uence *lignments
EB+ ,ourse on Protein Motifs"Signatures
htt!""222%ebi%ac%u@"training"online"course"introduction3rotein3classi8cation3ebi
Position3Seci8c Scoring Matri4 for
Pro@aryotic Beli437urn3Beli4 Motifs
Sequence Helix Turn Helix
RCRO_LAMBD F G Q T K T A K D L G V Y Q S A I N K A I H
RCRO_BP434 M T Q T E L A T K A G V K Q Q S I Q L I E A
RCRO_BPP22 G T Q R A V A K A L G I S D A A V S Q W K E
RPC1_LAMBD L S Q E S V A D K M G M G Q S G V G A L F N
RPC1_BP434 L N Q A E L A Q K V G T T Q Q S I E Q L E N
RPC1_BPP22 I R Q A A L G K M V G V S N V A I S Q W E R
RPC2_LAMBD L G T E K T A E A V G V D K S Q I S R W K R
LACR_ECOLI V T L Y D V A E Y A G V S Y Q T V S R V V N
CRP_ECOLI I T Q Q E I G Q I V G C S R E T V G R I L K
TRPR_ECOLI M S Q R E L K N E L G A G I A T I T R G S N
RPC1_CPP22 R G Q R K V A D A L G I N E S Q I S R W K G
GALR_ECOLI A T I K D V A R L A G V S V A T V S R V I N
Y77_BPT7 L S H R S L G E L Y G V S Q S T I T R I L Q
TER3_ECOLI L T T R K L A Q K L G V E Q P T L Y W H V K
VIVB_BPT7 D Y Q A I F A Q Q L G G T Q S A A S Q I D E
DEOR_ECOLI L H L K D A A A L L G V S E M T I R R D L N
RP32_BACSU R T L E E V G K V F G V T R E R I R Q I E A
Y28_BPT7 E S N V S L A R T Y G V S Q Q T I C D I R K
IMMRE_BPPH S T L E A V A G A L G I Q V S A I V G E E T
Bloc@s or 9inger Prints from
Multile Se1uence *lignments
EB+ ,ourse on Protein Motifs"Signatures
htt!""222%ebi%ac%u@"training"online"course"introduction3rotein3classi8cation3ebi
Bloc@s or 9inger Prints from
Multile Se1uence *lignments
EB+ ,ourse on Protein Motifs"Signatures
htt!""222%ebi%ac%u@"training"online"course"introduction3rotein3classi8cation3ebi
Pro8les, PS+3B/*S7
Bidden Mar@ov Models
AA' AA2 AA( AA% AA) AA*
+ ' + 2 + ( + % + )
D 2
D (
D %
D )
Discovering 9unction from Protein Se1uence
,onsensus Se1uences
or Se1uence Motifs
Zinc Finger (C22 type!
C "#2$%& C "#'2& "#($)&
Sequences of
Common
Structure or Function
P'$("(') P'$("(')
1 2 3 4 5 * + , - 10 11 12 1 2 3 4 5 * + , - 10 11 12
A A 2 1 3 13 10 12 *+ 4 13 - 1 2 2 1 3 13 10 12 *+ 4 13 - 1 2
R R + 5 , - 4 0 1 1* + 0 1 0 + 5 , - 4 0 1 1* + 0 1 0
N N 0 , 0 1 0 0 0 2 1 1 10 0 0 , 0 1 0 0 0 2 1 1 10 0
D D 0 1 0 1 13 0 0 12 1 0 4 0 0 1 0 1 13 0 0 12 1 0 4 0
% % 0 0 1 0 0 0 0 0 0 2 2 1 0 0 1 0 0 0 0 0 0 2 2 1
Q Q 1 1 21 , 10 0 0 + * 0 0 2 1 1 21 , 10 0 0 + * 0 0 2
E E 2 0 0 - 21 0 0 15 + 3 3 0 2 0 0 - 21 0 0 15 + 3 3 0
G G - + 1 4 0 0 , 0 0 0 4* 0 - + 1 4 0 0 , 0 0 0 4* 0
H H 4 3 1 1 2 0 0 2 2 0 5 0 4 3 1 1 2 0 0 2 2 0 5 0
. . 10 0 11 1 2 10 0 4 - 3 0 1* 10 0 11 1 2 10 0 4 - 3 0 1*
L L 1* 1 1+ 0 1 31 0 3 11 24 0 14 1* 1 1+ 0 1 31 0 3 11 24 0 14
K K 3 4 5 10 11 1 1 13 10 0 5 2 3 4 5 10 11 1 1 13 10 0 5 2
M M + 1 1 0 0 0 0 0 5 + 1 , + 1 1 0 0 0 0 0 5 + 1 ,
F F 4 0 3 0 0 4 0 0 0 10 0 0 4 0 3 0 0 4 0 0 0 10 0 0
P P 0 * 0 1 0 0 0 0 0 0 0 0 0 * 0 1 0 0 0 0 0 0 0 0
S S 1 1+ 0 , 3 1 3 0 2 2 2 0 1 1+ 0 , 3 1 3 0 2 2 2 0
T T 5 22 3 11 1 5 0 2 2 2 0 5 5 22 3 11 1 5 0 2 2 2 0 5
W W 2 0 0 0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 0 0 0 1 0 1
Y Y 1 0 4 2 0 1 0 0 2 4 0 1 1 0 4 2 0 1 0 0 2 4 0 1
V V * 3 1 1 2 15 0 0 2 12 0 2, * 3 1 1 2 15 0 0 2 12 0 2,
PSSMs or &eight Matrices
Sequence Similarity
10 20 30 40 50
Query VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS
|| | | ||||| | |||| | || | | | |
D!"!#!$e HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN
10 20 30 40 50
Bidden Mar@ov Models from
Multile Se1uence *lignments
EB+ ,ourse on Protein Motifs"Signatures
htt!""222%ebi%ac%u@"training"online"course"introduction3rotein3classi8cation3ebi
Data Mining!
7he Seach for Buried 7reasure
Data Mining!
7he Seach for Buried 7reasure
Data Mining!
7he Seach for Buried 7reasure
P(5S+7E Patterns
htt!""e4asy%org"rosite"
.
*ctive site of trysin3li@e serine roteases
G D S G G
.
Ainc 9inger (,
;
B
;
tye)
C-X(2,4)-C-X(12)-H-X(3,5)-H
.
)3-lycosylation Site
N-[^P]-[S T]-[^P]
.
Bomeobo4 Domain Signature
[LIVMF]-X(5)-[LIVM]-X(4)-[IV]-[RKQ]-X-W-X(8)-[RK]
S2iss +nstitute of Bioinformatics
htt!""222%isb3sib%ch"
E4asy Bioinformatics (esource Portal
htt!""e4asy%org"
E4asy Bioinformatics (esource Portal
htt!""e4asy%org"
UniProt Cno2ledge Base
htt!""222%unirot%org"
UniProt 5sin Entries
htt!""222%unirot%org"
UniProt Buman 5sin *dvanced Search
htt!""222%unirot%org"
UniProt Buman 5sin Entries (evie2ed
htt!""222%unirot%org"
UniProt Buman 5sin 5P)#M& Entry
htt!""222%unirot%org"unirot"P=D==#
Blast UniProt Buman 5sin 5P)#M& Entry
htt!""222%unirot%org"unirot"P=D==#
Blast UniProt Buman 5P)#M& (esults
htt!""222%unirot%org"unirot"P=D==#
),B+ B/*S7 Bome Page
htt!""blast%ncbi%nlm%nih%gov"
),B+ B/*S7 Bome Page
htt!""blast%ncbi%nlm%nih%gov"
),B+ B/*S7 Parameters
htt!""blast%ncbi%nlm%nih%gov"
Pro8les, PS+3B/*S7
Bidden Mar@ov Models
AA' AA2 AA( AA% AA) AA*
+ ' + 2 + ( + % + )
D 2
D (
D %
D )
Discovering 9unction from Protein Se1uence
,onsensus Se1uences
or Se1uence Motifs
Zinc Finger (C22 type!
C "#2$%& C "#'2& "#($)&
Se1uence Similarity
10 20 30 40 50
1 VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS
|| | | ||||| | |||| | || | | | |
2 HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN
10 20 30 40 50
Sequences of
Common
Structure or Function
P'$("(') P'$("(')
1 2 3 4 5 * + , - 10 11 12 1 2 3 4 5 * + , - 10 11 12
A A 2 1 3 13 10 12 *+ 4 13 - 1 2 2 1 3 13 10 12 *+ 4 13 - 1 2
R R + 5 , - 4 0 1 1* + 0 1 0 + 5 , - 4 0 1 1* + 0 1 0
N N 0 , 0 1 0 0 0 2 1 1 10 0 0 , 0 1 0 0 0 2 1 1 10 0
D D 0 1 0 1 13 0 0 12 1 0 4 0 0 1 0 1 13 0 0 12 1 0 4 0
% % 0 0 1 0 0 0 0 0 0 2 2 1 0 0 1 0 0 0 0 0 0 2 2 1
Q Q 1 1 21 , 10 0 0 + * 0 0 2 1 1 21 , 10 0 0 + * 0 0 2
E E 2 0 0 - 21 0 0 15 + 3 3 0 2 0 0 - 21 0 0 15 + 3 3 0
G G - + 1 4 0 0 , 0 0 0 4* 0 - + 1 4 0 0 , 0 0 0 4* 0
H H 4 3 1 1 2 0 0 2 2 0 5 0 4 3 1 1 2 0 0 2 2 0 5 0
. . 10 0 11 1 2 10 0 4 - 3 0 1* 10 0 11 1 2 10 0 4 - 3 0 1*
L L 1* 1 1+ 0 1 31 0 3 11 24 0 14 1* 1 1+ 0 1 31 0 3 11 24 0 14
K K 3 4 5 10 11 1 1 13 10 0 5 2 3 4 5 10 11 1 1 13 10 0 5 2
M M + 1 1 0 0 0 0 0 5 + 1 , + 1 1 0 0 0 0 0 5 + 1 ,
F F 4 0 3 0 0 4 0 0 0 10 0 0 4 0 3 0 0 4 0 0 0 10 0 0
P P 0 * 0 1 0 0 0 0 0 0 0 0 0 * 0 1 0 0 0 0 0 0 0 0
S S 1 1+ 0 , 3 1 3 0 2 2 2 0 1 1+ 0 , 3 1 3 0 2 2 2 0
T T 5 22 3 11 1 5 0 2 2 2 0 5 5 22 3 11 1 5 0 2 2 2 0 5
W W 2 0 0 0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 0 0 0 1 0 1
Y Y 1 0 4 2 0 1 0 0 2 4 0 1 1 0 4 2 0 1 0 0 2 4 0 1
V V * 3 1 1 2 15 0 0 2 12 0 2, * 3 1 1 2 15 0 0 2 12 0 2,
PSSMs or &eight Matrices
Entre0 -ene search for ,olorblindness
Entre0 -ene search for ,olorblindness
Entre0 -ene search for ,olorblindness
Entre0 -ene search for 5sins
Entre0 -ene search for 5sins
B/*S7 Similarity Search
htt!""222%ncbi%nlm%nih%gov"B/*S7"
,hoose Standard Protein3Protein B/*S7
htt!""222%ncbi%nlm%nih%gov"B/*S7"
Paste Se1uence, ,hoose S2issProt Database
and B/*S7E
5tional Parameters
B/*S7 ,onserved Domain 5utut
Se1uence *ligned 2ith Domain
Most Signi8cant Similarity Bits
Most Signi8cant Similarity Bits
Bovine Blue 5sin Similarity
Pro8les, PS+3B/*S7
Bidden Mar@ov Models
AA' AA2 AA( AA% AA) AA*
+ ' + 2 + ( + % + )
D 2
D (
D %
D )
Discovering 9unction from Protein Se1uence
,onsensus Se1uences
or Se1uence Motifs
Zinc Finger (C22 type!
C "#2$%& C "#'2& "#($)&
Se1uence Similarity
10 20 30 40 50
1 VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGS
|| | | ||||| | |||| | || | | | |
2 HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN
10 20 30 40 50
Sequences of
Common
Structure or Function
P'$("(') P'$("(')
1 2 3 4 5 * + , - 10 11 12 1 2 3 4 5 * + , - 10 11 12
A A 2 1 3 13 10 12 *+ 4 13 - 1 2 2 1 3 13 10 12 *+ 4 13 - 1 2
R R + 5 , - 4 0 1 1* + 0 1 0 + 5 , - 4 0 1 1* + 0 1 0
N N 0 , 0 1 0 0 0 2 1 1 10 0 0 , 0 1 0 0 0 2 1 1 10 0
D D 0 1 0 1 13 0 0 12 1 0 4 0 0 1 0 1 13 0 0 12 1 0 4 0
% % 0 0 1 0 0 0 0 0 0 2 2 1 0 0 1 0 0 0 0 0 0 2 2 1
Q Q 1 1 21 , 10 0 0 + * 0 0 2 1 1 21 , 10 0 0 + * 0 0 2
E E 2 0 0 - 21 0 0 15 + 3 3 0 2 0 0 - 21 0 0 15 + 3 3 0
G G - + 1 4 0 0 , 0 0 0 4* 0 - + 1 4 0 0 , 0 0 0 4* 0
H H 4 3 1 1 2 0 0 2 2 0 5 0 4 3 1 1 2 0 0 2 2 0 5 0
. . 10 0 11 1 2 10 0 4 - 3 0 1* 10 0 11 1 2 10 0 4 - 3 0 1*
L L 1* 1 1+ 0 1 31 0 3 11 24 0 14 1* 1 1+ 0 1 31 0 3 11 24 0 14
K K 3 4 5 10 11 1 1 13 10 0 5 2 3 4 5 10 11 1 1 13 10 0 5 2
M M + 1 1 0 0 0 0 0 5 + 1 , + 1 1 0 0 0 0 0 5 + 1 ,
F F 4 0 3 0 0 4 0 0 0 10 0 0 4 0 3 0 0 4 0 0 0 10 0 0
P P 0 * 0 1 0 0 0 0 0 0 0 0 0 * 0 1 0 0 0 0 0 0 0 0
S S 1 1+ 0 , 3 1 3 0 2 2 2 0 1 1+ 0 , 3 1 3 0 2 2 2 0
T T 5 22 3 11 1 5 0 2 2 2 0 5 5 22 3 11 1 5 0 2 2 2 0 5
W W 2 0 0 0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 0 0 0 1 0 1
Y Y 1 0 4 2 0 1 0 0 2 4 0 1 1 0 4 2 0 1 0 0 2 4 0 1
V V * 3 1 1 2 15 0 0 2 12 0 2, * 3 1 1 2 15 0 0 2 12 0 2,
PSSMs or &eight Matrices
Evaluation of Pro8les
Negati,e Proteins
Positi,e Proteins
-
Evaluation of Pro8les
Negati,e Proteins
Positi,e Proteins
-N
-P
FN FP
Sensiti,ity.
-P/(-P0FN!
Speci1city.
-N/(-N0FP!
Positi,e Pre2icti,e 3alue.
-P/(-P0FP!
MyBits /ocal Motifs Search
http://myhits.isb-sib.ch/
MyBits /ocal Motifs Fuery
http://myhits.isb-sib.ch/
MyBits /ocal Motifs Search
http://myhits.isb-sib.ch/
MyBits /ocal Motifs Summary
http://myhits.isb-sib.ch/
MyBits /ocal Motif Bits
http://myhits.isb-sib.ch/
MyBits /ocal Motifs Bist (,ont%)
http://myhits.isb-sib.ch/
MyBits /ocal Motifs Bist (,ont%)
MyBits /ocal Motifs Bist (,ont%)
+nterPro
htt!""222%ebi%ac%u@"interro"

+nterProScan
htt!""222%ebi%ac%u@"interro"

+nterPro Scan
htt!""222%ebi%ac%u@"7ools"fa"irscan"
+nterPro Scan Bour-lass
htt!""222%ebi%ac%u@"+nterProScan"
+nterPro Scan (esults
htt!""222%ebi%ac%u@"+nterProScan"
+nterPro Scan (esults
htt!""222%ebi%ac%u@"+nterProScan"
-5! -ene 5ntology Database
htt!""222%geneontology%org"
-5! -ene 5ntology for 5sin 5P)#M&
htt!""222%geneontology%org"
-5! -ene 5ntology for 5sin 5P)#M&
htt!""222%geneontology%org"
-5! Se1uence +nformation for 5P)#M&
htt!""222%geneontology%org"
-5! *nnotations for 5P)#M&
htt!""222%geneontology%org"
-5! -ene 5ntology Database
htt!""222%geneontology%org"
-5! -ene 5ntology 7erms for 5P)#M&
htt!""222%geneontology%org"
-5! -ene 5ntology 7erm -,(P
htt!""222%geneontology%org"
-5! -ene 5ntology -,P( 7erm
htt!""222%geneontology%org"
-5! -ene 5ntology -,P( 7erm
htt!""222%geneontology%org"
Bioinformatics Bome2or@
htt!""biochem##$%stanford%edu"bioinformatics%html
Bome2or@ *ssignment
#) Select a protein from 5M+M or from Entre0 -ene or from UniProt concerning the
disease of interest to you% ,oy and save the 9*S7* format of the rotein 8le%
;) Search your rotein for motifs 2ith the MyBits Motif Scan Fuery% Be sure to +nclude
Prosite Patterns, Prosite 9re1uent Patterns, Prosite Pro8les, Pre8les, Pfam BMMSs
(local Models) in your search% Please send me the MyBits you thin@ are biologically
signi8cant and at least # or ; hits 2hich you thin@ are not statistically or biologically
signi8cant% Please note that only the Pro8les have e4ectation values% 7he Patterns
do not have a measure of statistical signi8cance%
G) Search your rotein for bloc@s using the +nterPro database% Please send me a fe2 of
the +nterPro domains hits you thin@ are signi8cant and at least # or ; hits 2hich you
thin@ are not statistically or biologically signi8cant% Please note that the default
grahic outut of +nterPro does not list e4ectation values% Hou must s2itch to the
7abular vie2 to obtain the statistical signi8cance%
D) Search your rotein for homology using the B/*S7 method% Please reort t2o or
three hits 2hich are both statistically and biologically signi8cant% *lso reort t2o or
three hits 2hich you thin@ are neither statistically nor biologically signi8cant% +f
your rotein family is very large, you may have to as@ B/*S7 to return more hits to
8nd statistically insigni8cant hits%
Statistical vs% Biological Signi8cance
*ssignment
9irst, for each search (MyBits, +nterPro and B/*S7 hit), + 2ould li@e you to
reort some signi8cance hits and describe 2hy you thin@ they are
signi8cant both statistically and biologicallyI also reort some statistically
insigni8cant hits (and 2hy) and are any of your statistically insigni8cant
hits, still signi8cant biologically)% 7o remind you 2hat + said in class! a
statistically signi8cant 8nd in the database search is al2ays biologically
signi8cant, but a biologically signi8cant result in the search is not
necessarily al2ays statistically signi8cant%
Statistical signi8cance and e4ectation values%
Statistical signi8cance is determined by the e4ectation value 2hich gives you
a measure of ho2 li@ely this 8nding is based on ure chance% * 8nding
2ith an E3value of # or greater is not signi8cant because it could occur by
ure chance% * 8nding 2ith an E3value less than #=
3G
(one chance in a
thousand) is generally considered statistically signi8cant (unless of course
you are doing a #,=== searchesE)% So the lo2er the e4ectation value, the
more signi8cant the 8nding% 9indings bet2een #=
3G
and # are in the so
called t2ilight 0one and re1uire some further analysis or e4eriments to
determine their validity%
Statistical vs% Biological Signi8cance (cont)
+nterPro
Unli@e most of the other methods, +nterPro sets a very high level of
signi8cance for a 8nding before it 2ill reort it% 7his means that
you 2ill usually not 8nd any statistically insigni8cant hits for this
articular search%
Biological Signi8cance
+n order to determine biological signi8cance you must read the
biological roerties (ontology terms are the most useful) of your
rotein and the biological roerties of your 8ndings% 7he
8ndings may be signi8cant because the 8nding de8nes a very
closely related rotein family (osins for e4amle) or a very broad
family (-3couled rotein recetors or ?3transmembrane roteins)
or a common structure (rotein fold) or a seci8c function (retinal
binding site) or a very seci8c catalytic activity% Hou should
describe in 2ords the level of the biological signi8cance%
Statistical vs% Biological Signi8cance (cont)
MyBits
+f you as@ MyBits to return P*77E()s as 2ell as motifs, you 2ill
notice that P*77E()s do not have E3values associated 2ith them
so there is no easy 2ay to Kudge statistical signi8cance% &ith
attern 8ndings you are left only 2ith Kudging biological
signi8cance% *lso none of the 9re1uent atterns from MyBits are
statistically signi8cant%
B/*S7
+f you do not have any insigni8cant hits from the B/*S7 search, it
means that your rotein family is very large and you have to as@
B/*S7 to return more results using the *dvanced 5tions at the
bottom of the form% 5nly 2hen you see hits 2ith E3values L =%==#
do you have insigni8cant 8ndings%
Bidden Mar@ov Models from
Multile Se1uence *lignments
EB+ ,ourse on Protein Motifs"Signatures
htt!""222%ebi%ac%u@"training"online"course"introduction3rotein3classi8cation3ebi
Multile Enhancer Se1uences
Structure of <: ,*P
4 5ichael 67 8ing
Poly*denylation of m()*s
4 5ichael 67 8ing
+ntron Slicing Mechanism
4 5ichael 67 8ing
Slicing, ,aing & oly*denylation
Hields Mature m()*
-ranscript
mRNA
9ene
+ntron +ntron :;on :;on :;on
Promoter -erminator
5 (<
Splicing
poly=A
5 3
Cap
5 3
-SS
--S
-E)S,*) -ene Model
htt!""genes%mit%edu"-E)S,*)%html
i22en
5ar>o, mo2els
of gene
structure
4 Christopher ?urge
*lternative Slicing -enerates Distinct
Proteins
-ranscript
mRNA='
9ene
+ntron +ntron :;on :;on :;on
Promoter -erminator
<:
G:
-ranscript
mRNA=2 <: G:
Alternate Splicing
Splicing
poly=A
poly=A
Cap
Cap
ES7s, 9ull /ength cD)*
Uni-ene & (efSe1 Databases
-ranscript
mRNA
9ene
+ntron +ntron :;on :;on :;on
Promoter -erminator
)<
(<
(< :S-s
)< :S-s
Full @ength cDNA
Splicing
)< A-R
)< A-R (< A-R
(< A-R
Protein
Cap
poly=A
*lternative Slicing
Detected in ES7 /ibraries
-E)S,*) -ene Model
htt!""genes%mit%edu"-E)S,*)%html
i22en
5ar>o, mo2els
of gene
structure
-ene /oci
httpB//CCC7ncDi7nlm7nih7go,/entreE/query7fcgiF2D.gene
Protein
Sequences
9enscan
9rail:"P
F9:N:S
9ene
@ocus
mRNA
Sequences
:S-
Sequences
-enomics, Bioinformatics &
,omutational Biology
Computational ?iology
Computational 5olecular ?iology
?ioinformatics
9enomics
Proteomics Structural 9enomics
-enomics, Bioinformatics &
,omutational Biology
Computational ?iology
Computational 5olecular ?iology
?ioinformatics
9enomics
Proteomics Structural 9enomics
Systems ?iology
Databases
Machine /earning
(obotics
Statistics & Probability
*rti8cial +ntelligence
-rah 7heory
+nformation 7heory
*lgorithms
-enomics, Bioinformatics &
,omutational Biology
Computational ?iology
Computational 5olecular ?iology
?ioinformatics
9enomics
Proteomics Structural 9enomics
(edundancy in -enomic
& Protein Se1uences
.
D)* is double3stranded
.
-enetic code
.
*ccetable amino3acid
relacements
.
+ntron3e4on variation
.
*lternative slicing
.
Strain variations (S)Ps)
.
Se1uencing errors
Bidden Mar@ov Models (after Baussler)
htt!""222%cse%ucsc%edu"combio"sam%html
AA1 AA2 AA3 AA4 AA5 AA6
I 1
I 2
I 3
I 4
I 5
D 2
D 3 D 4
D 5
9*M at Sanger ,enter (UC)
htt!""222%sanger%ac%u@"Soft2are"Pfam"
9*M at Sanger ,enter (UC)
htt!""fam%sanger%ac%u@"
9*M at Sanger ,enter (UC)
htt!""fam%sanger%ac%u@"
9*M at Sanger ,enter (UC)
htt!""fam%sanger%ac%u@"
9*M at Sanger ,enter (UC)
htt!""fam%sanger%ac%u@"
9*M at Sanger ,enter (UC)
htt!""fam%sanger%ac%u@"

You might also like