Professional Documents
Culture Documents
UDC
Prediction of effects associated with single-point protein mutations and Study of mutation databases
(1) ) (2)
(3) (4) ( )
http://202.113.20.161:8001/index.htm
2010
25
1120070340
2010
25
2010
25
( 20
10 20
20
20
2 10 20
2 5 10
2 5 10
DNA
DNA
RNA -
DNA-
HERG
HERG
PMD Database
Protein Mutant
II
Abstract
Abstract
Mutations are changes in the DNA sequences and include many types, including substitutions, Insertions&Deletions, amplifications, etc. Mutations can be roughly classified into natural mutations, random induced mutations and site-directed induced mutations. In forward genetics, researchers start with a mutation phenotype from natural mutations or random mutagenesis experiments and work toward identifying the mutated genes. In reverse genetics, based on the large scale of genome sequencing, researchers can use site-directed mutagenesis experiments to study functions of genes or elements on DNA sequences, structures of RNA or proteins or other properties. Mutagenesis experiments play an indispensable role in biological basic researches (e.g. the investigation of protein structure function relationship, identification of DNA protein interaction sites) and applications (e.g. drug design, gene therapy). The dramatically accumulated molecular biology data from mutagenesis experiments have made it possible to systematically study mutation problems by bioinformatics methods. To facilitate such studies, a number of mutation databases have been developed. However, the heterogeneity of those databases makes it difficult to submit, exchange, and use mutation data. The Human Varisom Project (HVP) has been initiated to provide unified, standardized and high quality mutation data. This brought the issue of the integration and standardization of existing mutation databases. Data mining and knowledge discovery based on mutation databases is another class of important tasks in HVP. Among those tasks, one of the most significant challenges is to predict the effects of protein point substitutions (mutations). The result of prediction can be used to guide biological experiments directly. Moreover, this kind of research laid a foundation for further study in related biological problems (e.g. studies of protein functions). The research work presented in this dissertation includes two parts. In the first part (chapter 2), HVP and its development will be introduced first. We will then address some problems related to the integration and standardization of mutation databases. We will propose the Hierarchical Entity-Relation Graph (HERG) Model, which can be used to depict published molecular biology databases graphically. The HERG model can also be extended into a basic model in a unified framework for
III
Abstract
standardizing the heterogeneous databases. In the second part (chapter 3), we will report a novel substitution-matrix based kernel for support vector machine (SVM) and its application in predicting the effects of protein point substitutions (mutations). We will demonstrate the advantages of the new kernel over classical SVM kernels, based on a large dataset extracted from Protein Mutant Database (PMD) dadabae. We will conclude this part with discussion of the meaning of substitution-matrix based kernel functions using information theories. Key words mutation, prediction, SVMs, substitution-matrix, human variome, data
model, standardization
IV
................................ ................................ ................................ ................................ ................... I Abstract ................................ ................................ ................................ ................................ ........... III ................................ ................................ ................................ ................................ ................. V ................................ ................................ ................................ ................................ .... 1 ................................ ................................ ................................ ............................ 1 1.1.1 ................................ ................................ ................................ ..... 1 1.1.2 ................................ ................................ ................................ . 4 1.1.3 ................................ ................................ ... 15 ................................ ................................ ................... 19 1.2.1 ................................ ................................ ................................ ............... 19 1.2.2 ................................ ................................ ................................ ............... 19 1.2.3 ................................ ................................ ........................... 20 HERG ................................ ................................ ........................ 21 2.1.1 2.1.2 2.1.3 2.1.4 2.2.1 2.2.2 2.2.3 2.2.4 ................................ ................................ ............................... 21 ................................ ................................ ........... 21 ................................ ................................ ............... 22 ................................ ................................ ............... 27 ................................ ................................ ................................ ...................... 28 ................................ ................................ ........................... 29 ................................ ................................ ....................... 29 ................................ ................................ ....................... 31 ................................ ................................ ....................... 34 ................................ ................................ ....... 36
HERG ................................ .................... 37 2.3.1 ................................ ................................ ........................... 37 2.3.2 HERG ................................ ................................ ................................ ............ 38 2.3.3 HERG ................................ ................................ ........................ 41 ................................ ................................ ................................ ............................ 44 SVM ................................ .................... 45 ................................ ................................ ....................... 45 3.1.1 ................................ ................................ ............................... 45 3.1.2 ................................ ................................ ....................... 46 3.1.3 ................................ ................................ ................................ ............... 47 / 3.2.1 3.2.2 3.3.1 3.3.2 3.3.3 ................................ ................................ ................................ ..... 48 ................................ ................................ ................... 48 ................................ ................................ ................................ ........ 50 ................................ ................................ ................... 55 ................................ ................................ ............... 55 ................................ ................................ ................... 57 ................................ ................................ ....................... 58
V
................................ ................................ ....... 61 3.4.1 ................................ ................................ ............... 61 3.4.2 ................................ ................................ ................................ ... 62 3.4.3 ................................ ................................ ................................ ............... 63 3.4.4 ................................ ................................ ................................ ............... 66 ................................ ................................ ................................ ................................ .. 66 3.5.1 ................................ ................................ ............... 66 3.5.2 ................................ ................................ ........... 67 3.5.3 ................................ ................................ ........... 69 3.5.4 ................................ ................................ ....................... 70 ................................ ................................ ................................ ............................ 70 ................................ ................................ ................................ ...................... 71 ................................ ................................ ................................ ....... 71 4.1.1 HERG ................................ ................................ .... 71 4.1.2 ................................ ................................ ............... 71 ................................ ................................ ................................ .............................. 73 ................................ ................................ ................................ .......................... 74 4.3.1 HERG ................................ ................................ .... 74 4.3.2 ................................ ................................ ............... 74 ................................ ................................ ................................ ................................ ........ 77 ................................ ................................ ................................ ................................ ................ 87 ................................ ................................ ................................ ................................ .... 88 A ................................ ................................ ............................ 88 ................................ ................................ ................................ ..... 88 ................................ ................................ ............................. 88 ................................ ................................ ..................... 90 ................................ ................................ ................................ ......... 93 B ................................ ................................ ........................ 96 ................................ ................................ ............................. 96 PMD ................................ ................................ ................................ ........... 98 SNPeffect ................................ ................................ ................................ . 100 C HERG XML Schema ................................ ................................ .................. 102 1 2.3.3 132 ................................ ................................ ................ 108 99 ................................ ................................ ............. 108 33 ................................ ................................ ..................... 111 ................................ ................................ ... 113 ................................ ................................ ................................ ................................ ...... 113 ................................ ................................ ................................ .................. 113 SCI 4 ................................ ................................ ................................ ........ 113 EI (2 ) ................................ ................................ ................................ ................ 113 3 2 ................................ ................................ ................................ ........ 114 ................................ ................................ ................................ ............ 114
VI
1.1.1
NIH
[1]
1.1.1.2
[2]
1.1 1.1
1.1
1.1 1.1
1.1.1.3
[3]
2000
[4]
2009
http //database.oxfordjournals.org
SVM
1.1.2
1.1.2.1 DNA
[7] [2]
DNA
variation
alternation
[8]
mutation
polymorphism
variation
[8][9][10][11]
SNPs[12]
1.1.2.2
[13]
10bp
point mutation substitution A T transversion synonymous mutation nonsynonymous or missense mutation frameshift mutation 1.2 G transition C/T
[14]
C A/G
DNA
neutral mutation
1.1.2.3
gene mutation wild type gene dominant mutation a a A back mutation or reversion insertion DNA repeating element deletion DNA DNA transposable element A a allele recessive mutation A
1.3
1.2
7
1.3A amplification or duplication gene duplication 1.3B rearrangement inversion gene 1.1.2.3 1.3C translocation DNA tandem duplication segmental duplication
fusion
natural variant random mutagenesis site-directed mutagenesis hereditary mutation germline mutation somatic mutation acquired mutation
de novo mutation mosaic mutation heterozygous mutation homozygous mutation mutation . loss-of-function mutation gain-of-function mutation
8
compound heterozygous
dominant negative mutation mutation mutation mutation DNA mutation in coding region mutation in intron region 1.1.2.4 forward genetics mutation in regulatory region lethal mutation
neutral
genotype
[13]
genetic variation
mapping
cloning
sequencing
1.1.2.5
genetic marker 4 4 marker morphological marker cytological marker DNA biochemical molecular marker
Denaturing Gradient Gel Electrophoresis Heteroduplex Analysis CDI CFLP of Mismatch PCR Conformational Polymorphism CCM dideoxy Fingerprinting DNA HA
Cleavage Fragment Length Polymorphism Enzyme Mismatch Cleavage ddF DNA chip Chemical Cleavage PCR
PCRPCR-SSCP
[18]
PCR Single-Strand
[19]
RFLP
pedigree
population
[21]
10
recombinant frequency map unit Log Odd score Affected Sib Pair Member APM
[22]
LODs ASP
[19]
Disequilibrium
LD
[23]
Case-Control study
CC
CC
population stratification family based design Haploid-Relative-Risk Transmission Disequilibrium Test TDT
[21]
HRR
1.4
11
[24]
genotyping
DNA
1.5
PCR
DNA RPR
Random-Priming Recombination
1.6 PCR
[18]
13
1.5 M13 PCR PCR DNA M13 PCR PCR 1.6 Pfu-PCR
[18] [26]
Eckstein
UMP
[18]
Kunkel
PCR
DNA
[18]
1.1.2.5
[27]
14
1.1.3
1.7
1
1.1.3.1
15
2 Locus-Specific Databases
[29] [30]
262
[32] [33] [34]
2.1.1.1 1.1.3.2
LSDBs
[36]
HVP[37] 1.1.2.1
HVP
HVP
1.1.3.3
16
[38]
Mutation
[40] [41]
Martin ISAB
[43]
web
[48]
Greenblatt[51] Gefen 18
disease associated
17
SAP
nsSNPs
1.1.2.1
Decision Trees
DT
empirical rules
[46][63][64] [41][54-60] [65][66][67] [61][68-72] [61][62] [69]
mutagenesis 3
[73][74]
T4 lac repressor
bacteriophage T4 lysozyme
[75][76]
HIV-1 2.2.1
HIV-1
protease
[77]
SWISS-PROT/TrEMBL
40%
pseudo
pseudo 3.1.2
homologues / / identity
[57][69][71]
u 10
18
1.2.1
1
2 3
1.2.2
1
2 HERG 3 SVM
19
1.2.3
1 1.8 HERG
1.8
20
HERG
HERG
2.1.1
2.1.1.1
HGMD
[78-80]
in Man
OMIM
GDB
[83]
HGBASE
LSDBs
curation
21
HERG
2.1.1.2 1994 ASHG Disorders Research Centre Melbourne mutation databases Human Genome Organisation HUGO 1996 HUGO HUGO-MDI http 6 HVP
[86]
2001
ASHG
2.1.2
HVP HVP 2004 System HGVSYS HVP 2.1.2.1 HGVSYS HGVSYS HGVSYS 3 LSDBs HGVbase WayStation Central Database national and ethnic-specific mutation databases 2.1 HVP
[88]
2006
22
HERG
2.1
HGVSYS
[36]
WayStation
Human Mutation Genome Variation Reports PubMed PubMed ID 2.2 GVRs GVRs WayStation 2003 //www.centralmutations.org/ 2005 PubMed ID 7 http WayStation
HGVSYS
HGVSYS
92%
LSDBs http 1
HGVS
HERG
2.2
WayStation
LSDBs
WayStation
HERG
23 12
HVP
10
25
HERG
12 Coordinating office Howard Florey Institute HVP HVP 10-12 HVP The clinic and phenotype Richard Cotton 1996 &
/ Disease-Specific Database
/ HGVS
Variation/linkage of common
diseases/research laboratory
26
HERG
HVP
HVP
HVP
HVP
Translation
HVP
2.1.3
1996 2006 HUGO
2.1
27
HERG
2.1
HVP
[89 [95] [11] [38 [47 [98 [104] [106] [107] [78] [108 [113
[91] 1998 [96] [97] 2000 2000 [43 [50] [100] [105] 44]
[92] 2000
[93
94] 2002
2001 2000
2002 [51
[45 [101
46]
2004 2005
1998 [115]
[80 2009
85]
109] 2001
[110] 2003
2.1.4
HVP HVP
1. HVP HVP 2. HVP
3. HVP
HVP
4. HVP
28
HERG
5. HVP 6. HVP
HVP
1.1.3.1 2.2.2
2.2.3
2.2.4
2.2.1
2.1.1.1
Horaitis
[28]
Gene29
HERG
Horaitis HGVS LSDBs HUGO Gene Nomenclature Committee 2.3 disease-specific mutation databases AlzGene HGNC http
Disease Centered Central Mutation Databases system -specific mutation databases Mitochondrial Mutation Databases RNA //ribosome.fandm.edu HGVS SNP http //www.mitomap.org http 2.1.1.1 Central Mutation &
2.3 HGVS
30
LSDBs
HERG
SNP Databases
Horaitis
[28]
2.2.2
4 completeness availability 2.2.2.1 record or entry field domain 2.3.2 B quality nonredundancy
richness
2.2.2.2
31
HERG
2.2.2.3
2.2.2.4
32
HERG
Accessibility web
2.4
Entrez
local deployment Application Programming Interface API remote module XML Schema [122] eXtensible Mark-up Language SQL scritps XML
33
HERG
2.2.3
3 integration 2.2.3.1 View integration Warehouse
[123]
link
SRS ARX
[124]
Entrez
[125]
2.4
Entrez Federation
[126]
2.5
KIND
[127]
overview
2.2.3.2
34
HERG
Mediation
[126]
KIND
2.6 B/S
2.6
35
HERG
2.2.4
2.7
[123]
HERG
HERG
2.3.1
2.2.4 flat file relational XML 2.2 Polymorphism Markup Language PML
[138]
object
[138]
Proteomics
[137]
Standard Initiative Model Molecular Interaction format PSI-MI PharmGKB MAGE-OM MicroArray Gene Expression Object Model XGAP
[141]
[139]
the model of
www.xgap.org
Sequence Variation Markup Language GSVML Experiment Model FuGE GSVML HL7-CGM
the Health Level Seven Clinical the Functional Genomics XGAP FuGE
www.hl7.org
PML*#
PaGE-OM*# PSI-MI*#
International Warfarin Pharmacogenetics Consortium IWPC Microarray Gene Expression Data Group MGED Rosetta Agilent Company and Affymetrix Company Groningen Bioinformatics Center Tokyo Medical and Dental University HL7 Clinical Genomics Work Group Rosseta biosoftware Company 2.2
MAGE-ML *
MAGE-OM XML
XML #
XSD
2.3.2 HERG
4 Sequence Features Networks & Pathways Structures Terms & Nomenclatures Others Experiment
38
HERG
DNA RNA
2.8 HERG
39
HERG
2.3.2.2 HERG
2.9
40
HERG
2.9
2.3.3 HERG
HERG
[3]
1170 HERG
99
&
99
HERG SNPs
2.3.3.1 2000[4] 132 33 99 2.3.3.2 HERG HERG HERG 6 0 1 PMD HERG HERG 99 2.10A 99 HERG 1 0 4 2009[5] 1
PMD
41
[118]
HERG
2.10A
99
2.10 HERG
PMD
SNPeffect
2.3.3.3
cluster
42
HERG
database pattern
HERG 2.11
2.3.3.4 2 3 3 pattern6 4 2.11B 4 pattern1 3 2.11C pattern3 3 pattern7 3 pattern9 2.11 2.11A 2 pattern4
2.11
2 Genotype to Phenotype
[138]
G2P
3 / SNP
HERG
99 19/99
hotplot 8/99
1 111 112
[i]
[j]
44
SVM
SVM
HGP
3.1.1
3.1 i
3
- xi1 xi0 xi1 - xin 1 xin
3.1
45
SVM
n=3
3.1.2
x i ! x i n x i n 1 - xi1 xi0 x i1 - xin 1 x in
20
20 0[145] 10000000000000000000
20 C Y 21 20 0 C
D 00100000000000000000 N
20 1 21
http //www.genome.jp/aaindex/
3.1.2.3
46
SVM
3.2
[145]
3.2 5
S eq
QSEPEDLLK 20 N
20 v 9
AM
S eq
20
3.1.3
47
SVM
[148]
filter 2 3 SVM
/
performance evaluation
3.2.1
3.2.1.1
test
8 3.2.1.5
1 1 10-fold k-2
48
k k=10
k-1
SVM
3.2.1.2 Resubstitution
[149]
3.2.1.5
49
SVM
Cross Validation
l
k-fold
ku2
k k
l
k-1 k k 3 k k
5 7 10
k k k
3.2.2
binary 1/-1 1/0
ti yi
performance measures
l
Y ! y1 . . y l . 1 l ti l i !1
T ! t1 . . t l
l N l!P N
t! l
y!
1 l yi l i !1
3.2.2.1
matrix
50
SVM
3.3
3.2.2.2
Ac
TP FP TN FN
Sn ! ? / FN
A! TP / P TP TP Sp ! ? / TP FP
A TP Sn ! ? / TN FP
A! TN / N TN Sp ! ? / FN
A TN TN Ac ! ? TN
/(TP TN FP FN ) A TP Err ! 1 Ac
3.1
Sensitivity of positive examples Sensitivity of negative examples Specificity of negative examples Error Rate Err Sn Specificity of positive examples Sp Overall Accuracy
Sn Q + Sp
Q P
P + Q2
Sp Precision
binary 1
51
1/0
SVM
1
dp p=1 Lp L1
|y i i
p ti |
p
3.2 ti d1 FP FN d1
d1 3.2
yi
d2
3.2.2.4
CC !
(T T )(Y Y )T
(T T )(T T )
ti TT ! TP FN
T
!
T T
TY T l t y
2
[(Y Y )(Y Y )
yi
(TT l t ) (YY l y )
1/0
2
3.3
YY ! TP FP
TY ! TP 3.4
3.3
t ! (TP FN ) / l
y ! (TP FP ) / l
CC !
TP l t y
3.4
l t y(1 t )(1 y)
3.4 CC Matthews Correlation Coefficient
CC !
3.5 MCC
3.5
[150]
Matthews Matthews
TP TN
FP FN
TP FN
TP FP
TN FN
TN FP
Approximate Correlation Coefficient
3.2.2.5 3.5
0 MCC
TN FN ! 0 Burse ACP
[150]
3.5
SVM
0
ACP !
0
TP TN TN 1 TP TP FN TP FP TN FP TN FN 4
3.6
AC
ACP 3.7
3.2.2.6
Relative Entropy
Kullback Leibler KL
[150]
3.2.2.7 ROC
t i ! 1
TN TN TNR
ti ! 1
Negative Rate
FAR
H1
alternative hypothesis
ti ! 1
Mutual Information
y i ! 1
True FN FN
H0
False
F
TP TP TPR
t i ! 1
N False
53
SVM
3.4
FPR
3.9
0.05
E E
TN FP !1 !1E N N FN FNR ! FAR ! !F P TP FN TPR ! !1 !1 F P P FP FPR ! FRR ! !E N TNR !
E U
54
3.9
F E
ROC
F
SVM
U F
3.4A
3.5
ROC
EER
FRR
FAR AUG
3.3.1
SVM
[151]
[152]
3.3.1.1 SVM
55
over fitting
SVM
generalization min
A A.8
1 w 2
SVM
C\i
i !1
SVM
Min s.t. tT
Q( ) ! ! 0,
1 T G cT 2 [ 0 , 0 , 0 - , 0 ]T e
3.10
e [ C , C , C - , C ]T
x i , x j ", i, j ! 1, 2, - , l
! [E 1 , E 2 , - E l ]T
G ! ti t j K
c ! [1,1, - 1]T
t ! [ t1 , t 2 , - t l ] T
1 Q( ) ! Q( ) Q( ) T ( ) ( ) T G ( ) 2
Q
Q ( ) u Q ( ) Q ( ) T ( )
( )T G( ) u 0 G
3.3.1.3 SVM
56
robustness
SVM
3.3.1.4 SVM
3.3.1.5
{( x i , t i ) x R
n
, t i { 1 , 1 }, i ! 1 , -
, l}
a H ! _ : R n p Z , Z ! _ , 2, - , k a f 1
H
SVM
H
f A. SVM
y ! f ( x ) ! sgn( E i t i ( x i , x) b * )
i !1
SVM
3.3.2
3.3.1.5
[153]
Linear Function
RBF
57
Sigmoid
Polynomial Function
3.11
SVM
p=1 RBF
K ( x i , x j ) ! (1 Sigmoid x i , x j ") d
3.11
[153]
SVM
[153]
3.3.3
3.11
K ( x i , x j ")
xi
xj
3.3.3.1 DNA similarity matrix amino acid similarity matrix amino acid mutation matrix
[147]
20 score matrix
[147]
1 2
58
SVM
PAM
PAM 34 85 relative mutability mutation probability matrix 10 PAM1 log odds matrix 1%
[154]
1572 71
10
3.6
BLOSUM62
PAM1
n
59
SVM
Swiss-Prot20
504
30%
3.1 PAM PAM BLOSUM BLOSUM
x j ! x n x n 1 - x 1 x 0 x1j - x n1 x n j j j j j j
S (xi , x j )
xi , x j " S (xi , x j )
p
n u u S (x i , x j ) ! s( x i , x j ) u ! n
3.12
xi , x j G 20
G ! { A, C , D, E , F , G, H , I , K , L,
s ( xi , x j ) 3.6 BLOSUM62
60
, N , P , Q, R , S , T , V , W , Y }
SVM
K ( x i , x j ) ! (1 S ( x i , x j )) d K ( x i , x j ) ! exp( || x i x j || 2 / W 2 ) || x i x j || 2 ! S ( x i , x i ) S ( x j , x j ) S ( x i , x j ) S ( x j , x i ) K ( x i , x j ) ! tanh( k 1 S ( x i , x j ) k 2 ) d
3.13
3.4.1
3.4.1.1 PMD pmdchseq.07Mar26.Z Visual Basic) PMD FUNCTION 40028 20 A234N A 234 234 A N B CHANGE ftp://spock.genes.nig.ac.jp/pub/pmd/ pmd.current 2000 12 24 VB
61
SVM
3.4.2
SVM SVM 3.11 3.13 LIBSVM[156] RBF C ! 1, 1 2 ! 1 l W SVM
3.4.2.1 SVM
RBF 3.7A 20
3.11
3.7 IP-SVM1
40
2 IP-SVM2 Eisenberg
[157]
0 2n 1 N C
62
SVM
3.4.2.3 SVM
SM-SVM BLOSUM62
RBF 3.6
3.11 /
3.14 3.14
a, b G ! {, A, C , D, E, F , G , H , I , K , L, a
fb
b b
P
3.4.3
3.2 IP-SVM1 TP FP TN FN 27628 9195 795 190 99. 32% 75.03% 7.96%
SVM IP-SVM2 27381 8977 1013 437 98.43% 75.31% 10.14% SM-SVM 26466 7092 2899 1352 95.14% 78.87% 29.02%
Sn
Sp
Sn
63
, N , P, Q, R, S , T , V , W , Y } pab
fa
SVM
Sp
Ac
MCC AUC
SM-SVM IP-SVM1
IP-SVM2 SM-SVM
3.8
ROC
2n 1 Ac 2n 1 5 7
[158]
49
SVM
SVM
19
Ac
SVM
SM-SVM
Ac
4.3.2.6
3.9
3.4.3.3 SVM
U
TP FP TN FN
Ac t i ! {1, 1} U
U Ac Ac U
Ac
SVM
3.4.4
[159]
ruber)CBS01
D200G
R227C
R392A
3.10
3.5.1
3.5.1.1
66
SVM
3 . K (x i , x j ) ! K (x j , x i )
'
[0
K (x i , x j ) ! K (x j , x i )
SVM
[160]
BLOSUM62 BLOSUM62
AAindex
PMD
3.5.2
RBF 3.13
67
SVM
K ( x i , x j ) ! exp( || x i x j || 2 / W 2 ) || x i x j || 2 ! S ( x i , x i ) S ( x j , x j ) S (x i , x j ) S ( x j , x i )
K ( x i , x j ) ! exp( S ( x i , x i ) / W 2 ) v exp( S ( x j , x j ) / W 2 ) v exp( S ( x i , x j ) / W 2 ) v exp( S ( x j , x i ) / W 2 )
3.15
3.16 3.14
3.16
K 3 ( x i , x j ) ! exp( S ( x i , x j ) / W 2 )
n
K 3 ( x i , x j ) ! exp(
u!n
s ( xi , x j ) / W 2 )
K 3 ( x i , x j ) ! exp[
1 PW 2
u ! n
log
n
Q ( xi , x j ) R ( xi ) R ( x j )
u i n u u
K 3 ( x i , x j ) ! exp[ k ln
u !n n
Q( x
u i
,xj ) ]
u j
u !n n
R ( x ) R( x
u! n u i n
K 3 (x i , x j ) ! [
u! n n
Q( x
u i
,xj ) ]k
u j
3.16
R( x ) R( x
u !n u !n
) xi
u
3.16
k! 1 log e W 2P
[161]
xi
xj
xj
Method of Types
11.1.2
3.16
3.17
3.17 20
Px i
G
xi , x j
xi xi
u
xj
xi , x j
Q ab
Px i
b
Px j
xj
G Ra Rb
a 3.14
fa fb
3.14
p ab
68
SVM
H
xi , x j
D
Q ab
Px i Px j ! Rb
! Qab
xi
Ra
Px ! Ra
i
xj
Rb
3.18
K ( x i , x j ) ! 2 2 kn[ H ( Qab )] a {b
3.5.3
3.19
BLOSUM62 PSI-BLAST[162] Position-Specific Iterated BLAST 3.1.2.3 PSSM PSSM ortholog SM-SVM PSSM
69
SVM
3.5.4
SVM fold 1 sequence-based 10-fold 9 sequenced-based sample-based fold 1 9/10 sequence-based sample-based 10-fold 10 3.2.1.5 9
sequence-based
[163]
[a][g]
3.4.1
[h]
[b][c]
70
4.1.1 HERG
HERG HERG HERG HERG HERG annotation mining MeSH[164] Medical Subject Headings SO[165] Sequence Ontology ( databases) Medical Language System HERG HERG GOA[167] GO[166] Gene Ontology The Gene Ontology Annotation UMLS[168] OBO[123] the Open Biomedical Ontologies Unified /
4.1.2
4.1.2.1 PMD
2.2.1 N 19N
71
19
1.1.3.4
LSDBs LSDBs
2 PMD 3.4.1.1
4.1.2.3
3.4.3.1
Ac
Sn
Ac
9 1
90
72
[68]
[169]
SVM C
4.1.2.4
[143]
SVM
4.1.2.5
[159]
SVM
HERG HERG
73
SVM
4.3.1 HERG
HERG
4.3.2
4.3.2.1
3.19
[0]
[ ]
[53]
Swiss-Prot
74
disease
polymorphism 4.3.2.4
75
65% 78%
70% 75%
active pocket
76
[1] BISTIC Definition Committee. NIH Working Definition of Bioinformatics and Computational Biology [DB/OL]. www.bisti.nih.gov/docs/CompuBioDef.pdf, 2000-7-17 /20100127. [2] Crick F. Central Dogma of Molecular Biology [J]. Nature, 1970, Vol.227:561 /nar/about.html, 2000-7-17/20100127. [4] Andreas D B. The Molecular Biology Database Collection: an online compilation of relevant database resources [J]. Nucleic Acids Research, 2000, Vol.28(1):1 7. [5] Michael Y G, Guy R C. Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009 [J]. Nucleic Acids Research, 2009, Vol.37 (Database issue):D1 D4. [6] Landsman D, Gentleman R, Kelso J, et al. DATABASE: A new forum for biological databases and curation [J]. Database, 2009, Vol. 2009, bap002; doi:10.1093/database/bap002 published on March 26, 2009 [7] Wikipedia. Mutation [DB/OL]. http://en.wikipedia.org/wiki/Mutation, 20100123/201001 23. [8] Cotton R G H. Communicating Mutation Modern Meanings and Connotations [J]. Human Mutation, 2002, Vol. 19(1):2 3. [9] Condit C M, Achter P J, Lauer I, Sefcovic E. The Changing Meanings of Mutation. A Contextualized Study of Public Discourse [J]. Human Mutation, 2002, Vol. 19(1):69 75. [10] Marshall J H. On The Changing Meanings of Mutation [J]. Human Mutation, 2002, Vol. 19(1):76 78. [11] Cotton R G H, Scriver C R. Proof of Vol. 12(1):1 [13] 174 [14] [15] 23 /20100123. [16] Ring H Z, Kwok P Y, Cotton R G. Human variome project: an international collaboration to catalogue human genetic variation [J]. Pharmacogenomics, 2006, Vol.7:969 972. [17] [18] 129/20100125. [20] Wikipedia. Genetic association [DB/OL]. http://en.wikipedia.org/wiki/Genetic_association, 2009126/20100125. [21]
77
563.
[12] Brookes A J. The essence of SNPs [J]. Gene, 1999, 8; Vol. 234(2):177 186.
EJ
[M]
/WebForms/WebDefines.aspx?searchword=%e7%aa%81%e5%8f%98%e4%bd%93 201001
[J] [M] 1
[DB/OL]
http://www.uscnlife.cn
/web/page/news6139.htm 2005-4-15/20100127. [22] 2005-4-15/20100127. [23] [24] 18. [25] [26] [27] ( ) T AP 2007 544 572. [28] Horaitis O, Cotton R G H. Human mutation databases [A]. In: Dracopoli N C, Haines J L, et al. eds. In Current protocols in human genetics [C]. New York: Wiley-Liss, 2003. pp. 7.11.1 7.11.11. [29] Marsh S, Kwok P, McLeod H L. SNP Databases and Pharmacogenetics: Great Start, but a Long Way to Go [J]. Human Mutation, 2002, Vol. 20(3):174 Mutation, 2000, Vol. 15(1):36 44. 179. [30] Porter C J, Talbott-Jr C C, Cuticchia A J. Central mutation databases a review [J]. Human [31] Claustres M, Horaitis O, Vanevski M, et al. Time for a unified system of mutation description and reporting: a review of locus specific mutation databases [J]. Genome Research, 2002, Vol. 12:680 688. [32] Human Genome Variation Society LSDB Core Data Integration Project [R]. Barcelona: Human Genome Variation Society Newsletter, 2008:2 Repositories [J]. Human Mutation, 2009, Vol. 30(4):493 4. 495. [33] den Dunnen J T, Sijmons R H, Andersen P S, et al. Sharing Data between LSDBs and Central [34] Howard H J, Horaitis O, Cotton R G H, et al. The Human Variome Project (HVP) 2009 Forum Towards Establishing Standards [J]. Human Mutation, 2010, Vol. 31(3):366 367. [35] Collins F S, Patrinos A, Jordan E, et al. Goals for the U.S. Human Genome Project: 1998-2003 [J]. Science, 1998, Vol. 282(5389):682 689. [36] Horaitis O, Cotton R G H. The Challenge of Documenting Mutation Across the Genome: the Human Genome Variation Society Approach [J]. Human Mutation, 2004, Vol. 23:447 452. [37] Cotton R G H, Auerbach A D, Axton M, et al. The Human Variome Project [J]. Science, 2008, Vol. 322(5903): 861 862. [38] Rogozin I B, Kondrashov F A, Glazko G V. Use of Mutation Spectra Analysis Software [J]. Human Mutation, 2001, Vol. 17(2):83 102. [39] Cox D G, Boillot C, Canzian F. Data Mining: Efficiency of Using Sequence Databases for Polymorphism Discovery [J]. Human Mutation, 2001, Vol. 17(2):141 150. [40] Collins A, Ennis S, Taillon-Miller P, et al. Allelic Association With SNPs: Metrics, Populations, and the Linkage Disequilibrium Map [J]. Human Mutation, 2001, Vol. 17(4):255 262. [41] Wang Z, Moult J. SNPs, Protein Structure, and Disease [J]. Human Mutation, 2001, Vol. 17(4):263 270. [42] Gut I G. Automation in Genotyping of Single Nucleotide Polymorphisms [J]. Human
78
http://www.uscnlife.cn/web/page/news6137.htm [DB/OL]. [M] [J] [M] 2 [M] 3 1 2006 http://en.wikipedia.org/wiki/Linkage_ 2005 2 Vol.10(1):92 96. 2002 178.
disequilibrium, 2010120/20100125.
Mutation, 2001, Vol. 17(6):475 492. [43] Martin A C R, Facchiano A M, Cuff A L, et al. Integrating Mutation Data and Structural Analysis of the TP53 Tumor-Suppressor Protein [J]. Human Mutation, 2002, Vol. 19(2):149 164. [44] Aerts J, Wetzels Y, Nadine C, et al. Data Mining of Public SNP Databases for the Selection of Intragenic SNPs [J]. Human Mutation, 2002, Vol. 20(3):162 173. [45] Lachmund P, Nebel I T, Fhrer D, et al. The Pedigree Tool: Web-Based Visualization of a Family Tree [J]. Human Mutation, 2004, Vol. 23(2):103 105. [46] Cai Z, et al. Bayesian approach to discovering pathogenic SNPs in conserved protein domains [J]. Human Mutation, 2004, Vol. 24(2):178 184. [47] Soussi T, Kato S, Levy P P, et al. Reassessment of the TP53 Mutation Database in Human Disease by Data Mining With a Library of TP53 Missense Mutations [J]. Human Mutation, 2005, Vol. 25(1):6 17. [48] Freimuth R R, Stormo G D, McLeod H L. PolyMAPr: Programs for Polymorphism Database Mining, Annotation, and Functional Analysis [J]. Human Mutation, 2005, Vol. 25(2):110 117. [49] Nalla V K, Rogan P K. Automated Splicing Mutation Analysis by Information Theory [J]. Human Mutation, 2005, Vol. 25(4):334 342. [50] Gao S, Zhang N, Duan G Y, et al. Prediction of function changes associated with single-point protein mutations using support vector machines (SVMs) [J]. Human Mutation, 2009, Vol. 30(8):1161 1166. [51] Greenblatt M S. Mutation clusters offer insight into predicting pathogenicity [J]. Human Mutation, 2010, Vol. 31(3): v. [52] Gefen A, Cohen R, Birk O S. Syndrome to gene (S2G): in-silico identification of candidate genes for human diseases [J]. Human Mutation, 2010, Vol. 31(3):229 236. [53] Care M A, Needham C J, Bulpitt A J, et al. Deleterious SNP prediction: be mindful of your training data! [J]. Bioinformatics, 2007, Vol. 23(6): 664 672. [54] Ng P C, Henikoff S. Predicting deleterious amino acid substitutions [J]. Genome Research, 2001, Vol. 11:863 874. [55] Sunyaev S, et al. Prediction of deleterious human alleles [J]. Human Molecular Genetics, 2001, Vol. 10(6):591 597. [56] Chasman D, Adams R M. Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure based assessment of amino acid variation [J]. Journal of Molecular Biology, 2001, Vol. 307:683 706. [57] Saunders C T, Baker D. Evaluation of structural and evolutionary contributions to deleterious mutations prediction [J]. Journal of Molecular Biology, 2002, Vol. 322:891 901. [58] Ramensky V, Bork P, Sunyaev S R. 2002. Human non-synonymous SNPs: server and survey [J]. Nucleic Acids Research, Vol. 30:3894 3900. [59] Herrgard S, Cammer S A, Hoffman B T, et al. Prediction of deleterious functional effects of amino acid mutations using a library of structure-based function descriptors [J]. PROTEINS, 2003, Vol. 53(4): 806 816. [60] Clifford R J, Edmonson M N, Nguyen C, et al. Bioinformatics tools for single nucleotide polymorphism discovery and analysis [J]. Ann. N. Y. Acad. Sci., 2004, Vol. 1020:101 109.
79
[61] Krishnan V G, Westhead D R. A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function [J]. Bioinformatics, 2003, Vol. 19:2199 2209. [62] Dobson R, et al. Predicting deleterious nsSNPs: an analysis of sequence and structural attributes [J]. BMC Bioinformatics, 2006, Vol. 7:217. [63] Verzilli C J, John C W, Stallard N, et al. A hierarchical Bayesian model for predicting the functional consequences of amino-acid polymorphisms [J]. Appl. Stat., 2005, Vol. 54:191 206. [64] Needham C J, et al. Predicting the effect of missense mutations on protein function: analysis with Bayesian networks [J]. BMC Bioinformatics, 2006, Vol. 7: 405. [65] Ferrer-Costa C, Orozco M, de la Cruz X. Sequence-based prediction of pathological mutations [J]. Proteins, 2004, Vol. 57(4):811 819. [66] Ferrer-Costa C, Orozco M, de la Cruz X. Use of bioinformatics tools for the annotation of disease-associated mutations in animal models [J]. Proteins, 2005, Vol. 61(4):878 887. [67] Ferrer-Costa C, Orozco M, de la Cruz X. Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties [J]. Journal of Molecular Biology, 2002, Vol. 315(4): 771 786. [68] Bromberg Y, Rost B. SNAP: predict effect of non-synonymous polymorphisms on function [J]. Nucleic Acids Research, 2007, Vol. 35:3823 3835. [69] Bao L, Cui Y. Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information [J]. Bioinformatics, 2005, Vol. 21:2185 2190. [70] Yue P, Li Z, Moult J. Loss of protein structure stability as a major causative factor in monogenic disease [J]. Journal of Molecular Biology, 2005, Vol. 353:459 463. [71] Yue P, Moult J. Identification and analysis of deleterious human SNPs [J]. Journal of Molecular Biology, 2006, Vol. 356:1263 1274. [72] Capriotti E, Calabrese R, Casadio R. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information [J]. Bioinformatics, 2006, Vol. 22(22):2729 2734. [73] Alber T, et al. Temperature-sensitive mutations of bacteriophage T4 lysozyme occur at sites with low mobility and low solvent accessibility in the folded protein [J]. Biochemistry, 1987, Vol. 26: 3754 3758. [74] Rennell D, et al. Systematic mutation of bacteriophage T4 lysozyme [J]. Journal of Molecular Biology, 1991, Vol. 222: 67 88. [75] Markiewicz P, et al. Genetic studies of the lac repressor. XIV. Analysis of 4000 altered Escherichia coli lac repressors reveals essential and non-essential residues, as well as spacers which do not require a specific sequence [J]. Journal of Molecular Biology, 1994, Vol. 240:421 433. [76] Suckow J, et al. Genetic studies of the Lac repressor. XV: 4000 single amino acid substitutions and analysis of the resulting phenotypes on the basis of the protein structure [J]. Journal of Molecular Biology, 1996, Vol. 261: 509 523. [77] Loeb D D, Swanstrom R, Everitt L, et al. Complete mutagenesis of the HIV-1 protease [J]. Nature, 1989, Vol. 340:397 400.
80
[78] Krawczak M, Cooper D N. The human gene mutation database [J]. Trends in Genetics, 1997, Vol. 13:121 122. [79] Cooper D N, Ball E V, Krawczak M. The human gene mutation database [J]. Nucleic Acids Research, 1998, Vol. 26(1):285 287. 51. [80] Krawczak M, Cooper D N. Human Gene Mutation Database. A biomedical information and research resource [J]. Human Mutation, 2000, Vol. 15(1):45 [81] Lehvslaiho H, Stupka E, Ashburner M. Sequence variation database project at the European Bioinformatics Institute [J]. Human Mutation, 2000, Vol. 15(1):52 56. [82] Hamosh A, Scott A F, Amberger J, et al. Online Mendelian Inheritance in Man (OMIM) [J]. Human Mutation, 2000, Vol. 15(1):57 Vol. 15(1):62 67. 75. 61. [83] Cuticchia A J. Future vision of the GDB human genome database [J]. Human Mutation, 2000, [84]Sherry S T, Ward M, Sirotkin K. Use of molecular variation in the NCBI dbSNP Database [J]. Human Mutation, 2000, Vol. 15(1):68 [85] Brookes A J, Lehvslaiho H, Siegfried M, et al. HGBASE: a database of SNPs and other variations in and around human genes [J]. Nucleic Acids Research, 2000, Vol. 28(1):356 360. [86] Cotton R G H. Progress of the HUGO Mutation Database Initiative: A Brief Introduction to the Human Mutation MDI Special Issue [J]. Human Mutation, 2000, Vol. 15(1): 4 Genome Variation Society (HGVS) [J]. Human Mutation, 2002, Vol. 19(1):1. [88] Cotton R G H, participants of the 2006 Human Variome Project meeting. Recommendations of the 2006 Human Variome Project meeting [J]. Nature Genetics, Vol. 39(4):433 436. [89] Beaudet A L, the Ad Hoc Committee on Mutation Nomenclature. Update on nomenclature for human gene mutations [J]. Human Mutation, 1996, Vol. 8(3):197 202. [90] Beutler E, McKusick V A, Motulsky A, et al. Mutation nomenclature: nicknames, systematic names and unique identifiers [J]. Human Mutation, 1996, Vol. 8(3):203 206. 3. 12. [91] Antonarakis S E, the Nomenclature Working Group. Recommendations for a nomenclature system for human gene mutations [J]. Human Mutation, 1998, Vol. 11(1):1 [92] den Dunnen J T, Antonarakis S E. Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion [J]. Human Mutation, 2000, Vol. 15(1): 7 [93] den Dunnen J T, Antonarakis S E. Mutation Nomenclature Extensions and Suggestions to Describe Complex Mutations:A Discussion [J]. Human Mutation, 2002, Vol. 20(5):403. [94] Nebert D W. Proposal for an Allele Nomenclature System Based on the Evolutionary Divergence of Haplotypes [J]. Human Mutation, 2002, Vol. 20(6):463 472. [95] Scriver C R, Nowacki P M, Lehvaslaiho H. Guidelines and recommendations for content, structure and deployment of mutation databases [J]. Human Mutation, 1999, Vol. 13(5):344 350. [96] Scriver C R, Nowacki P M, Lehvaslaiho H, et al. Guidelines and recommendations for content, structure, and deployment of mutation databases. II. Journey in progress [J]. Human Mutation, 2000, Vol. 15(1):13 15. 21. [97] Cotton R G H, Horaitis O. Quality control in the discovery, reporting, and recording of genomic variation [J]. Human Mutation, 2000, Vol. 15(1):16
81
6.
[87] Cotton R G H, Kazazian-Jr H H. Human Mutation: The Official Journal of the Human
[98] Brown A F, McKie M A. MuStaR and other software for locus-specific mutation databases [J]. Human Mutation, 2000, Vol. 15(1):76 85. 94. [99] Broud C. UMD (universal mutation database): a generic software to build and analyse locus specific databases [J]. Human Mutation, 2000, Vol. 15(1):86 [100] Fredman D, Jobs M, Strmqvist L, et al. DFold: PCR Design that Minimizes Secondary Structure and Optimizes Downstream Genotyping Applications [J]. Human Mutation, 2004, Vol. 24(1):1 8. 19. [101] Manaster C, Zheng W Y, Teuber M, et al. InSNP: A Tool for Automated Detection and Visualization of SNPs and InDels [J]. Human Mutation, 2005, Vol. 26(1):11 [102] Fokkema I F, den Dunnen J T, Taschner P E. LOVD: Easy Creation of a Locus-Specific Sequence Variation Database Using an "LSDB-in-a-Box" Approach [J]. Human Mutation, 2005, Vol. 26(2):63 68. [103] Broud C, Hamroun D, Collod-Broud G, et al. UMD (Universal Mutation Database): 2005 Update [J]. Human Mutation, 2005, Vol. 26(3):184 191. [104] Smith T D, Cotton R G H. VariVis: A visualization toolkit for variation databases [J]. BMC Bioinformatics, 2008, Vol. 9:206. [105] Brandon M C, Ruiz-Pesini E, Mishmar D, et al. MITOMASTER: A Bioinformatics Tool for the Analysis of Mitochondrial DNA Sequences [J]. Human Mutation, 2009, Vol. 30(1): 1 Mutations Database Initiative [J]. Human Mutation, 2000, Vol. 15(1):22 Mutation, 2000, 15(1):30 35. [108] Lehnert V, Holzwarth J, Ott M, et al. A Semi-Automated System for Analysis and Storage of SNPs [J]. Human Mutation, 2001, Vol. 17(4):243 254. [109] Zhang G, Zhang S Z, Chen W, et al. Go!Poly: A Gene-Oriented Polymorphism Database [J]. Human Mutation, 2001, Vol. 18(5):382 387. [110] Stenson P D, Ball E V, Mort M, et al. Human Gene Mutation Database (HGMD):2003 Update [J]. Human Mutation, 2003, Vol. 21(6):577 581. [111] Tahira T, Baba S, Higasa K, et al. dbQSNP: A Database of SNPs in Human Promoter Regions With Allele Frequency Information Determined by Single-Strand Conformation Polymorphism-Based Methods [J]. Human Mutation, 2005, Vol. 26(2):69 77. [112] Giardine B, Riemer C, Hefferon T, et al. PhenCode: Connecting ENCODE Data With Mutations and Phenotype [J]. Human Mutation, 2007, Vol. 28(6):554 562. [113] Yip Y L, Famiglietti M, Gos A, et al. Annotating Single Amino Acid Polymorphisms in the UniProt/Swiss-Prot Knowledgebase [J]. Human Mutation, 2008, Vol. 29(3):361 366. [114] Owen R P, Altman R B, Klein T E. PharmGKB and the International Warfarin Pharmacogenetics Consortium: The Changing Role for Pharmacogenomic Databases and Single-Drug Pharmacogenetics [J]. Human Mutation, 2008, Vol. 29(4): 456 Mutation, 2009, Vol. 30(3): E460 E466. [116] Friedrich A, Garnier N, Gagnire N, et al. SM2PH-db: an interactive system for the integrated analysis of phenotypic consequences of missense mutations in proteins involved in human genetic diseases [J]. Human Mutation, 2010, Vol. 31(2):127 135.
82
6.
[106] Maurer S. Coping with change: intellectual property rights new legislation, and the Human 29. [107] Knoppers B M, Laberge C M. Ethical guideposts for allelic variation databases [J]. Human
460.
[115] Rhee H, Lee J S. MedRefSNP: A Database of Medically Investigated SNPs [J]. Human
[117] Li J, Duncan D T, Zhang B. CanProVar: a human cancer proteome variation database [J]. Human Mutation, 2010, Vol. 31(3):219 Research, 1999, Vol. 27(1):355 357. D30. D197. 228. [118] Kawabata T, Ota M, Nishikawa K. The Protein Mutant Database [J]. Nucleic Acids [119] Benson D A, Karsch-Mizrachi I, Lipman D J, et al. GenBank [J]. Nucleic Acids Research, 2008, Vol. 36(Database issue): D25 [120] The UniProt Consortium. The Universal Protein Resource (UniProt) [J]. Nucleic Acids Research, 2007, Vol. 35(Database issue):D193 Vol. 40:1047 1051. [122] Madria S, Passi K, Bhowmick S. An XML Schema integration and query mechanism system [J]. Data & Knowledge Engineering, 2008, Vol. 65(2):266 4(5):337 345. [124] Etzold T, Argos P. SRS: Information retrieval system for molecular biology data banks [J]. Methods in Enzymology, 1996, Vol. 266:114 128. 162. [125] Schuler G D, Epstein J A, Ohkawa H, et al. Entrez: molecular biology database and retrieval system [J]. Methods in Enzymology, 1996, Vol. 266:141 [126] Kazemian M, Moshiri B, Nikhbakh H, et al. Architecture for Biological Database Integration [Z]. Artificial Intelligence and Machine Learning 2005 Conference (AIML 05), 2005 CICC, Cairo, Egypt. [127] Gupta A, Ludscher B, Martone M E. Knowledge-Based Integration of Neuroscience Data Sources [Z]. Proceedings of the 12th International Conference on Scientific and Statistical Database Management, 2000:39. [128] Ritter O, Kocab P, Senger M, at el. Prototype implementation of the integrated genomic database [J]. Comput. Biomed. Res., 1994, Vol. 27(2): 97 115. [129] Hirakawa M, Tanaka T, Hashimoto Y, et al. JSNP: a database of common gene variations in the Japanese population [J]. Nucleic Acids Research, 2002, Vol. 30(1):158 162. [130] Fredman D, Siegfried M, Yuan Y P, el at. HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources [J]. Nucleic Acids Research, 2002, Vol. 30(1):387 391. [131] International HapMap Consortium. The International HapMap Project [J]. Nature, 2003, Vol. 426(6968):789-796. [132] Thorisson G A, Lancaster O, Free R C, et al. HGVbaseG2P: a central genetic association database [J]. Nucleic Acids Research, Vol. 37(Database issue):D797 D802. [133] Bader G D, Donaldson I, Wolting C, et al. BIND--The Biomolecular Interaction Network Database [J]. Nucleic Acids Research, 2001; Vol. 29(1):242 245. [134] Xenarios I, Salwnski L, Duan X J, et al. DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions [J]. Nucleic Acids Research, 2002, Vol. 30(1):303 305. [135] Peri S, Navarro J D, Kristiansen T Z, et al. Human protein reference database as a discovery resource for proteomics [J]. Nucleic Acids Research, 2004, Vol. 32(Database issue):D497 D501.
83
[121] Hoffmann R. A wiki for the life sciences where authorship matters [J]. Nature Genetics 2008,
303
[123] Lincoln D S. INTEGRATING BIOLOGICAL DATABASES [J]. Nat Rev Genet., 2003, Vol.
[136] Zanzoni A, Montecchi-Palazzi L, Quondam M, et al. MINT: a Molecular INTeraction database [J]. FEBS Lett., 2002, Vol. 513(1):135 140. [137] Whirl-Carrillo M, Woon M, Thorn C F, et al. An XML-Based Interchange Format for GenotypePhenotype Data [J]. Human Mutation, 2008, Vol. 29(2):212 219. [138] Brookes A J, Lehvaslaiho H, Muilu J, et al. The phenotype and genotype experiment object model (PaGE-OM): a robust data structure for information related to DNA variation [J]. Human Mutation, 2009; Vol. 30(6):968 977. [139] Hermjakob H, Montecchi-Palazzi L, Bader G, et al. The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data [J]. Nature Biotechnology, 2004; Vol. 22(2):177 183. [140] Spellman P T, Miller M, Stewart J, Troup C, et al. Design and implementation of microarray gene expression markup language (MAGE-ML) [J]. Genome Biology, 2002, Vol. 3(9):RESEARCH0046. [141] Nakaya J, Hiroi K, Ido K, et al. An Overview of Genomic Sequence Variation Markup Language (GSVML) [J]. AMIA Annu Symp Proc. 2006; Vol. 2006:1043. [142] Jones A R, Miller M, Aebersold R. et al. The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics [J]. Nature Biotechnology, 2007, Vol. 25(10):1127 1133. [143] Reumers J, Conde L, Medina I, et al. Joint annotation of coding and non-coding single nucleotide polymorphisms and mutations in the SNPeffect and PupaSuite databases [J]. Nucleic Acids Research, 2008, Vol. 36(Database issue):D825 D829. [144] Yip Y L, Scheib H, Diemand A V, et al. The Swiss-Prot Variant Page and the ModSNP Database: A Resource for Sequence and Structure Information on Human Protein Variants [J]. Human Mutation, 2004, Vol. 23(5):464 470. [145] 2005 Vol.31(3):229 235 [146] Hua S J, Sun Z R. A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach [J]. Journal of Molecular Biology 2001 Vol. 308: 397 407. 36. 2004 [147] Tomii K, Kanehisa M. Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins [J]. Protein Eng., 1996, Vol.9:27 [148] Vol.31(11):180 184. [149] ( ) [M]
[J]
[J] 3
2004 473 475 [150] Baldi P, Brunak S, Chauvin Y, et al. Assessing the accuracy of prediction algorithms for classification: an overview [J]. Bioinformatics, 2000, Vol. 16(5):412 424. [151] [152] 41 [153] Vol.5(4):501 504
84
, 4 ,
, [M] 1 [J]
[M]
1 2006 37 ( ) 2006
2006 1
[154] Dayshift M O, Schwartz R M, Orcutt B C. A model for evolutionary change in proteins [J]. Atlas of Protein Sequence and Structure, 1978, Vol.5 (Suppl. 3): 345 352. [155] Henikoff S, Henikoff J G. Amino acid substitution matrices from protein blocks [J]. Proc. Natl Acad. Sci. USA, 1992, Vol. 89: 10915 10919. [156] Hsu C W, et al. A practical guide to support vector classification [DB/OL]. www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf, 2003-x-x/20100403. [157] Eisenberg D, Weiss R M, Terwilliger T C. The hydrophobic moment detects periodicity in protein hydrophobicity [J]. Proc Natl Acad Sci USA, Vol. 81(1): 140 144. [158] Capriotti E, Fariselli P, Calabrese R, et al: Predicting protein stability changes from sequences using support vector machines [J]. Bioinformatics, 2005, Vol. 21(Suppl.2):ii54 ii58. [159] 2009 Vol. 36(5): 658 665. [160] 2004 37 [161] ( )Cover T M 2007 198 199 [162] Altschul S F, Madden T L, Schffer A A, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J]. Nucleic Acids Research, 1997, Vol. 25 (17): 3389 3402. [163] Rost B, Sander C. Prediction of protein secondary structure at better than 70% accuracy [J]. Journal of Molecular Biology, 1993, Vol. 232(2):584 99. [164] Nelson S J, Schopen M, Savage A G, et al. The MeSH Translation Maintenance System: Structure, Interface Design, and Implementation [J]. Stud Health Technol Inform., 2004, Vol. 107(Pt1):67 69. [165] Eilbeck K, Lewis S E, Mungall C J, et al. The Sequence Ontology: a tool for the unification of genome annotations [J]. Genome Biology, 2005, Vol. 6(5):R44 [166] Gene Ontology Consortium. The Gene Ontology (GO) project in 2006 [J]. Nucleic Acids Research, 2006, Vol. 34 (Database issue):D322 D326. [167] Camon E, Barrell D, Lee V, Dimmer E, Apweiler R . The Gene Ontology Annotation (GOA) DatabaseAn Integrated Resource of GO Annotations to the UniProt Knowledgebase [J]. In Silico Biology, 2004, Vol. 4(1):5 6. [168] Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology [J]. Nucleic Acids Research, 2004, Vol. 32 (Database issue):D267 D270. [169] 10 13. [170] Capriotti E, Fariselli P, Casadio R: I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure [J]. Nucleic Acids Research, 2005, Vol. 33(Web Server issue):W306 stability W310. upon single point mutations. [J]. Bioinformatics, 2004, Vol. [171] Capriotti E, Fariselli P, Casadio R: A neural-network-based method for predicting protein changes 20(Suppl.1):i63 i68. [172] Cheng J, Randall A, and Baldi P. Prediction of protein stability changes for single-site
85
[J]
Vol.2008 35 4 :
mutations using support vector machines [J]. Proteins, 2006, Vol. 62(4):1125 1132. [173] Guo J, Chen H, Sun Z, et al. A novel method for protein secondary structure prediction using dual-layer SVM and profiles [J]. Proteins, 2004, Vol. 54(4):738 class [J]. BMC Bioinformatics, 2001, Vol. 2:3. [175] Cai Y D, Zhou G P, Chou K C. Support vector machines for predicting membrane protein types by using functional domain composition [J]. Biophys. J., 2003, Vol. 84(5), 3257 3263. [176] Park K J, Kanehisa M. Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs [J]. Bioinformatics, 2003, Vol. 19(13):1656 1663. [177] Cai Y D, Lin S L, Chou K C. Support vector machines for prediction of protein signal sequences and their cleavage sites [J]. Peptides, 2003, Vol. 24(1):159 161. [178] Cai Y D, Liu X J, Xu X B, et al. Support vector machines for predicting HIV protease cleavage sites in protein [J]. J. Comput. Chem., 2002, Vol. 23(2):267 proteins: specificity patterns 274. [179] Hansen J E, Lund O, Engelbrecht J, et al. Prediction of O-glycosylation of mammalian of UDP-GalNAc: polypeptide N-acetylgalactosaminyl transferase [J]. Biochem. J., 1995, Vol. 308(Pt3):801 813. [180] Caragea C, Sinapov J, Silvescu A, et al. Glycosylation site prediction using ensembles of Support Vector Machine classifiers [J]. BMC Bioinformatics, 2007, Vol. 8: 438. [181] Kim J H, Lee J, Oh B, et al. Prediction of phosphorylation sites using SVMs [J]. Bioinformatics, 2004, Vol. 20(17):3179 3184. [182] Zavaljevski N, Stevens F J, Reifman J. Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions [J]. Bioinformatics, 2002, Vol. 18(5):689 696. [183] Zhang T, Zhang H, Chen K, et al. Accurate sequence-based prediction of catalytic residues [J]. Bioinformatics, 2008, Vol. 24(20):2329 2338. [184] Zhang H, Zhang T, Chen K, et al. Sequence based residue depth prediction using evolutionary information and predicted secondary structure [J]. BMC Bioinformatics, 2008, Vol. 9:388. [185] Wang L H, Liu J, Li Y F, et al. Predicting protein secondary structure by a support vector machine based on a new coding scheme [J]. Genome Inform., 2004, Vol. 15(2):181-90. [186] [187] [188] 20100123/20100123. [189] ( [190] ( )Mitchell T M 2003 15 20. [M] 1 105. )Vapnik V N 2004 96 [M] 1 [DB/OL] http://zh.wikipedia.org/wiki/ [DB/OL] , 20100220/20100220. http://www.hudong.com/wiki/%E6%9C%BA%E5% [DB/OL] http://define.cnki.net/WebForms 743. [174] Cai Y D, Liu X J, Xu X, Zhou G P. Support vector machines for predicting protein structural
86
Jack
u u
Richard.G.H.Cotton
87
[186]
Machine Learning
[187] [188]
Learning Learning /
Ensemble
A.1
[151]
A.1
89
concept acquisition
[189]
A.1
SVM
[152]
{( x i , t i ) xi
, t i { 1 , 1 }, i ! 1 , ti
, l}
R 1 -1 ne3
i 0
90
OSH
A.2
n=3
G
n
g (x ) ! w, x " b ! 0, x R n=3 G-
w, x " G
A.2B G+ G G+
G-
w , x i " b u 1, t i ! 1 w , x i " b e 1, t i ! 1
t i [ w , x i " b ] u 1, i ! 1, 2,3, - l
A.1 A.2
A.3
G+ G d! G+
d ! 1/ w 2/ w
91
w, x i " b w
, xi G A.4
d ! 1/ w
A.4
w , x i " b ! 1, t i ! 1
G
w
OSH
A.5
w*
b*
y ! f ( x ) ! sgn( w * , x " b * )
x
A.5
y lagrange
A.5
l
Max
l
Q( ) ! E i
i !1
1 l E iE j t i t j 2 i , j !1
xi , x j " A.6
s.t.
t E
i i !1
! 0, E i u 0, i ! 1,2,3, - l
A.2A
J x
Hilbert
J (x)
A.7
min
A.8
92
Max
l
Q( ) ! E i
i !1
1 l E iE j t i t j 2 i , j !1
J ( x i ), J ( x j ) "
A.9
s.t.
t E
i i !1
! 0 , 0 e E i e, C
i ! 1,2,3, - l
J (x i ), J (x j ) "
K (x i , x j )
J (x i ), J (x j ) "
K (x i , x j ) ! J (x i ), J (x j ) " K (x i , x j )
J
A.9
l
Max
l
Q( ) ! E i
i !1
1 l E iE j t i t j K 2 i , j !1
xi , x j " A.10
s.t. A.8-10
Ei
t E
i i !1
! 0,
0 e E i e, C
i ! 1,2,3, - l A.9-10
C 0 0
Ei
l
Support Vector
y ! f ( x ) ! sgn( E i t i K ( x i , x ) b * )
i !1
SV
{( x i , t i )
n
, t i { 1 , 1 }, i ! 1 , -
, l}
A.11
A.12
R( f )
Remp ( f )
p ( x, t )
c( f (x), t )
93
p ( x, t ) lpg
Remp ( f ) p R ( f )
p
R( f )
p
A.3
A.3
[152]
Vapinic
[190]
h(ln R( f ) e Re p ( f ) A.13 0 eL e1
h
L 2l 1) ln 4 h l
VC
94
R ( f ) p in R ( f )
Re p ( f ) p in R( f )
A.13
FH
A.13
h l h
A.4
A.4
[152]
95
B
2.2.1 3 HGMD PMD SNPeffect
The Human Gene Mutation Database the Institute of Medical Genetics http //www.hgmd.cf.ac.uk inherited disease HGMD HGMD 3 Professional HGMD gene symbol ID gene-centered web gene description disease/phenotype ARX OMIM 5 ID HGMD 250
HGMD
GDB
4 gene symbol Chromosomal location sequence viewer exon NCBI Extended cDNA 25bp Gene name Accession number Splice junctions 25bp intron Mutation type Number of mutations
96
cDNA
NCBI
Mutation viewer
Mutation viewer
Mutation data by type Regulatory deletions Small deletions variations Splicing Small insertions
Mutation data by disease/phenotype First published mutation report External links PubMed PubMed
B.1 HGMD
97 ARX
PMD
PMD Protein Mutant Database Genetics DNA Data Bank
[118]
National Institute of Center for Information Biology and PMD 1970 immunoglobulin PMD 18 B.2 7
DNA
globin
ENTRY
AUTHORS JOURNAL TITLE PURPOSE CROSS-REFERENCE ID PROTEIN SOURCE N N-TERMINAL N Swiss-prot PDB NCBI
EXPRESSION-SYSTEM Yeast Human kidney 293 cells CHANGE FUNCTION + [0] [+] [-]
98
Escherichia coli
[-]
[=]
[+ +] [0]
[- -]
[+] [=]
B.2 HGMD
ARX
TRANSPORT DISEASE
99
SNPeffect
SNPeffect SNP http://pupasuite.bioinfo.cipf.es SNP PupaSuite SNPs 14935
[143]
Free University of Brussels Switch http://snpeffect.vib.be SNPeffect SNPs 43797 SNPs 4965073 133505
B.3 SNPeffect
100
SNP
23948838 7 SNP
7 Wild Type Functional Sites PupaSuite Molecular phenotype ID Protein Molecular YES/NO
Structure & Dynamics Cellular Processing Links SNP SNP Allele string phenotype Wild Type Identifiers Sequence SNP Disease Identifiers
SNP Sequence
Disease
Structure & Dynamics Stability Transmembrane regions Functional Sites Chaperone binding Cellular Processing Phosphorylation PupaSuite Triplex Links SNP TFBS SpliceSites SNP SNP related links Gene related links ESE N-
Catalytic sites
N-glycosylation
Acetylation
101
C HERG
XML Schema
<?xml version="1.0" encoding="utf-8" ?> <!--Created with Liquid XML Studio - 30 Day Trial Edition 7.1.5.1419 (http://www.liquid-technologies.com)--> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:complexType name="SequencesAndGenome"> <xs:all> <xs:element name="landmark"> <xs:complexType /> </xs:element> <xs:element name="DNA"> <xs:complexType /> </xs:element> <xs:element name="chromosome_breakpoint"> <xs:complexType /> </xs:element> <xs:element name="RNA"> <xs:complexType /> </xs:element> <xs:element name="protein"> <xs:complexType /> </xs:element> <xs:element name="others"> <xs:complexType /> </xs:element> </xs:all> </xs:complexType> <xs:complexType name="Sequence_Features"> <xs:all> <xs:element name="DNA"> <xs:complexType> <xs:all> <xs:element name="TR_site">
102
elementFormDefault="qualified"
<xs:complexType /> </xs:element> <xs:element name="exon_and_intron"> <xs:complexType /> </xs:element> <xs:element name="UTR"> <xs:complexType /> </xs:element> <xs:element name="sequence_variant"> <xs:complexType> <xs:all> <xs:element name="SNP"> <xs:complexType /> </xs:element> </xs:all> </xs:complexType> </xs:element> <xs:element name="others"> <xs:complexType /> </xs:element> </xs:all> </xs:complexType> </xs:element> <xs:element name="RNA"> <xs:complexType> <xs:all> <xs:element name="splice_site"> <xs:complexType /> </xs:element> <xs:element name="sequence_variant"> <xs:complexType /> </xs:element> <xs:element name="others"> <xs:complexType />
103
</xs:element> </xs:all> </xs:complexType> </xs:element> <xs:element name="protein"> <xs:complexType> <xs:all> <xs:element name="polypeptide_motif_or_domain"> <xs:complexType /> </xs:element> <xs:element name="catalytic_residue"> <xs:complexType /> </xs:element> <xs:element name="signal_peptide"> <xs:complexType /> </xs:element> <xs:element name="binding_site"> <xs:complexType /> </xs:element> <xs:element name="post_translational_site"> <xs:complexType /> </xs:element> <xs:element name="sequence_variant"> <xs:complexType /> </xs:element> <xs:element name="missense_or_nonsense"> <xs:complexType /> </xs:element> <xs:element name="others"> <xs:complexType /> </xs:element> </xs:all> </xs:complexType> </xs:element>
104
</xs:all> </xs:complexType> <xs:complexType name="Molecular_Phenome"> <xs:all> <xs:element name="single_molecular"> <xs:complexType> <xs:all> <xs:element name="gene_or_protein_property"> <xs:complexType /> </xs:element> <xs:element name="gene_or_protein_function"> <xs:complexType /> </xs:element> <xs:element name="protein_interaction"> <xs:complexType /> </xs:element> <xs:element name="protein_post_translation"> <xs:complexType /> </xs:element> <xs:element name="protein_transport_and_location"> <xs:complexType /> </xs:element> <xs:element name="gene_expression"> <xs:complexType /> </xs:element> <xs:element name="Others"> <xs:complexType /> </xs:element> </xs:all> </xs:complexType> </xs:element> <xs:element name="Multiple_Molecular"> <xs:complexType> <xs:all>
105
<xs:element name="transcriptome"> <xs:complexType /> </xs:element> <xs:element name="proteome"> <xs:complexType /> </xs:element> <xs:element name="metabolome"> <xs:complexType /> </xs:element> <xs:element name="epigenome"> <xs:complexType /> </xs:element> <xs:element name="others"> <xs:complexType /> </xs:element> </xs:all> </xs:complexType> </xs:element> </xs:all> </xs:complexType> <xs:complexType name="Phenome"> <xs:all> <xs:element name="deseases"> <xs:complexType /> </xs:element> <xs:element name="aging"> <xs:complexType /> </xs:element> <xs:element name="others"> <xs:complexType /> </xs:element> </xs:all> </xs:complexType> <xs:complexType name="NetworksAndPathways">
106
<xs:all> <xs:element name="signal_transduction"> <xs:complexType /> </xs:element> <xs:element name="cellular_process"> <xs:complexType /> </xs:element> <xs:element name="metabolic_pathway"> <xs:complexType /> </xs:element> <xs:element name="molecular_interaction_network"> <xs:complexType /> </xs:element> <xs:element name="others"> <xs:complexType /> </xs:element> </xs:all> </xs:complexType> <xs:complexType name="Structures"> <xs:all> <xs:element name="small_molecule_structure"> <xs:complexType /> </xs:element> <xs:element name="nucleic_acid_structure"> <xs:complexType /> </xs:element> <xs:element name="protein_structure"> <xs:complexType /> </xs:element> <xs:element name="carbohydrate_structure"> <xs:complexType /> </xs:element> <xs:element name="others"> <xs:complexType />
107
</xs:element> </xs:all> </xs:complexType> <xs:complexType name="Experiment_Documentation" /> <xs:complexType name="Terms_And_Nomenclatures" /> <xs:complexType name="Tools" /> <xs:complexType name="Others" /> </xs:schema>
1
99
2.3.3
132
GenAtlas Genetics Home Reference HAGR HCAD HGMD Human PAML Browser MSY Breakpoint Mapper MutDB OMIM SNP2NMD ALFRED CTGA Cypriot national mutation database Cytokine Gene Polymorphism Database Database of Genomic Variants dbQSNP dbRIP dbSNP D-HaploDB FINDBase
http://www.genatlas.org/ http://ghr.nlm.nih.gov/ http://genomics.senescence.info/ http://www.pdg.cnb.uam.es/UniPub/HCAD/ www.hgmd.cf.ac.uk http://mendel.gene.cwru.edu/adamsl ab/pbrowser.py http://breakpointmapper.wi.mit.edu http://mutdb.org/ http://www.ncbi.nlm.nih.gov/entrez/query.fc gi?db=OMIM http://variome.kobic.re.kr/SNP2NMD/ http://alfred.med.yale.edu http://www.cags.org.ae http://www.goldenhelix.org/cypriot/ http://www.nanea.dk/cytokinesnps/ http://projects.tcag.ca/variation/ http://qsnp.gen.kyushu-u.ac.jp/ http://falcon.roswellpark.org:9090/ http://www.ncbi.nlm.nih.gov/SNP/ http://orca.gen.kyushu-u.ac.jp http://www.findbase.org
108
F-SNP HapMap Project HGVbase JSNP PhenomicDB PolyDoms Polymorphix Protein Mutant DB SNP@Ethnos SNPeffect & PupaSuite TopoSNP TPMD Atlas of Genetics and Cytogenetics in Oncology and Haematology Cancer Chromosomes CancerGenes CanGEM CGED COSMIC Database of Germline p53 Mutations EHCO HPTAA IARC TP53 Database ITTACA MethyCancer OncoDB.HCC PubMeth SNP500Cancer SV40 Large T-Antigen Mutant Database Tumor Associated Gene Database Tumor Gene Family Databases (TGDBs) ALPSbase AlzGene Androgen Receptor Gene Mutations DB BGED BTKbase CarpeDB
http://compbio.cs.queensu.ca/F-SNP/ http://snp.cshl.org http://www.hgvbaseg2p.org/index http://snp.ims.u-tokyo.ac.jp/ http://www.phenomicdb.de http://polydoms.cchmc.org http://pbil.univ-lyon1.fr/polymorphix/query.p hp http://pmd.ddbj.nig.ac.jp/ http://bioportal.kobic.re.kr/SNPatETHNIC/ http://snpeffect.vib.be/ http://gila.bioengr.uic.edu/snp/toposnp http://tpmd.nhri.org.tw http://atlasgeneticsoncology.org/ http://www.ncbi.nlm.nih.gov/entrez/query.fc gi?db=cancerchromosomes http://cbio.mskcc.org/cancergenes http://www.cangem.org/ http://lifesciencedb.jp/cged/ http://www.sanger.ac.uk/perl/CGP/cosmic http://www.lf2.cuni.cz/projects/germline_mut _p53.htm http://ehco.iis.sinica.edu.tw http://www.bioinfo.org.cn/hptaa/ http://www-p53.iarc.fr/index.html http://bioinfo.curie.fr/ittaca http://methycancer.psych.ac.cn/Index.do http://oncodb.hcc.ibms.sinica.edu.tw http://matrix.ugent.be/pubmeth/ http://snp500cancer.nci.nih.gov http://supernova.bio.pitt.edu/pipaslab/ http://www.binfo.ncku.edu.tw/TAG/GeneDo c.php http://www.tumor-gene.org/tgdf.html http://www3.niaid.nih.gov/topics/ALPS/ http://www.alzgene.org http://www.mcgill.ca/androgendb/ http://genome.mc.pref.osaka.jp/BGED/ http://bioinf.uta.fi/BTKbase/ http://www.carpedb.ua.edu
109
CASRDB Collagen Mutation DB EpoDB GOLD.db HaemB HbVar HDBase HemBase HORDE HOX-PRO HPMR Human PAX2 Allelic Variant Database Human PAX6 Allelic Variant Database IL2Rgbase Imprinted Gene Catalogue INFEVERS KinMutBase Lowe Syndrome Mutation Database NCL Resource NEIbank PAHdb PGDB PHEXdb RB1 Gene Mutation DB SCAdb T1Dbase The Autism Chromosome Rearrangement DB The Lafora Database 16S and 23S rRNA Mutation Database ProTherm PLPMDB Telomerase database HIV Drug Resistance Database HIV Positive Selection Mutation Database
http://www.casrdb.mcgill.ca/ http://www.le.ac.uk/genetics/collagen/ http://www.cbil.upenn.edu/EpoDB/ http://gold.tugraz.at http://www.kcl.ac.uk/ip/petergreen/haemBdat abase.html http://globin.cse.psu.edu/hbvar/ http://hdbase.org/ http://hembase.niddk.nih.gov/ http://genome.weizmann.ac.il/horde/ http://www.iephb.nw.ru/labs/lab38/spirov/ho x_pro/hox-pro00.html http://receptome.stanford.edu/ http://pax2.hgu.mrc.ac.uk/ http://pax6.hgu.mrc.ac.uk/ http://research.nhgri.nih.gov/scid/ http://igc.otago.ac.nz/home.html http://fmf.igh.cnrs.fr/infevers http://www.uta.fi/imt/bioinfo/KinMutBase/ http://research.nhgri.nih.gov/lowe/ http://www.ucl.ac.uk/ncl/ http://neibank.nei.nih.gov http://www.pahdb.mcgill.ca http://www.ucsf.edu/pgdb/ http://www.phexdb.mcgill.ca http://www.verandi.de/joomla/ http://ymbc.ym.edu.tw/sca_ensembl/ http://t1dbase.org http://projects.tcag.ca/autism http://projects.tcag.ca/lafora/ http://ribosome.fandm.edu http://gibk26.bse.kyutech.ac.jp/jouhou/prothe rm/protherm.html http://www.studiofmp.com/plpmdb/ http://telomerase.asu.edu/ http://www.hiv.lanl.gov/content/sequence/RE SDB/ http://bioinfo.mbi.ucla.edu/HIV
110
SCMD FlyTrap PathBase Rice Mutant Database HAMSTeRS HIV-RT HvrBase Online Mendelian Inheritance in Animals KMDB
http://yeast.gi.k.u-tokyo.ac.jp/ http://flytrap.med.yale.edu/ http://www.pathbase.net/ http://rmd.ncpgr.cn/ http://europium.csc.mrc.ac.uk/WebPages/Mai n/main.htm http://hivdb.stanford.edu/cgi-bin/PRMut.cgi http://www.hvrbase.org/ http://omia.angis.org.au/ http://mutview.dmb.med.keio.ac.jp/Mutation View/jsp/index.jsp
33
MmtDB Mutation Spectra Database p53 Databases DRESH MitBASE Transgenic/Targeted Mutation Database DT40 FLAGdb/FST IDR HGBASE AGNS Asthma and Allergy Database Asthma Gene Database GRAP Mutant Databases GeniSys T-REGs SynDB Prostate Expression DB PTCH1 Mutation DB KBERG HemoPDB ERGDB EyeSite
http://www.ba.cnr.it/~areamt08/MmtDBWW http://info.med.yale.edu/mutbase/ http://metalab.unc.edu/dnam/mainpage.html http://www.tigem.it/LOCAL/drosophila/dros. html http://www3.ebi.ac.uk/Research/Mitbase/mitb ase.pl http://tbase.jax.org/ http://genetics.hpi.uni-hamburg.de/dt40.html http://genoplante-info.infobiogen.fr http://www.uta.fi/imt/bioinfo/idr/ http://hgbase.interactiva.de/ http://emj-pc.ics.uci.edu/mgs/dbases/agns http://cooke.gsf.de http://cooke.gsf.de/asthmagen/main.cfm http://tinyGRAP.uit.no/GRAP/ http://genisys.kaist.ac.kr:8080 No longer maintained http://syndb.cbi.pku.edu.cn http://www.pedb.org/ http://www.cybergene.se/cgi-bin/w3-msql/ptc hbase/index.html http://research.i2r.a-star.edu.sg/kberg http://bioinformatics.wistar.upenn.edu/HemoP DB/ http://sdmc.lit.org.sg/ergdb/cgi-bin/explore.pl http://eyesite.cryst.bbk.ac.uk/
111
DENIZ EICO DB AngioDB BayGenomics Oral Cancer Gene DB Human p53, human hprt, rodent lacI and rodent lacZ databases HGVS Databases SNAP DG-CST FESD
No longer maintained http://fantom2.gsc.riken.jp/EICODB/ http://angiodb.snu.ac.kr/ http://baygenomics.ucsf.edu/ Incorporated into TGDBs, no. 155 http://www.ibiblio.org/dnam/mainpage.html http://www.hgvs.org/ http://platform.humgen.au.dk/ http://dgcst.ceinge.unina.it/ http://sysbio.kribb.re.kr/FESD/
112
1 2007
24
1995
SCI
[a] Shan Gao, Ning Zhang, You G Duan, Zhuo Yang, Ji S Ruan, Tao Zhang. Prediction of function changes associated with single-point protein mutations using support vector machines (SVMs) [J]. Human Mutation, 2009 30 (8): 1161-1166. (IF= 7.033) [b] Ning Zhang, Jishou Ruan, Guangyou Duan, Shan Gao, Tao Zhang. The Interstrand Amino Acid Pairs Play a Significant Role in Determining the Parallel or Antiparallel Orientation of beta-Strands. Biochemical and Biophysical Research Communications, 2009, 386: 537-543. (IF= 2.648) [c] Ning Zhang, Guangyou Duan, Shan Gao, Jishou Ruan, Tao Zhang. Prediction of the Parallel / Antiparallel Orientation of Beta-Strands Using Amino Acid Pairing Preferences and Support Vector Machines.Journal of Theoretical Biology, 2010, 263(3): 360-368. (IF= 2.454) [d] ZHANG Ning, GAO Shan, DUAN Guangyou, YANG Zhuo, ZHANG Tao.SRD Journal of Biomedical Informatics, 2010 Accepted. (IF= 1.924) a universal software tool for DNA/Protein sequence relationship visualization based on undirected graphs.
EI
(2
[e] Guang-You Duan, Shan Gao, Ning Zhang, Zhuo Yang, Tao Zhang Component Vector method and its application in detecting similarities between sequences.Bioinformatics and Biomedical Engineering, 2009. ICBBE 2009. 3rd International Conference on 11-13 June 2009 Page(s): 1 - 3.DOI 10.1109/ICBBE.2009.5162547 [f] Ning Zhang, Shan Gao, Guang-You Duan, Zhuo Yang, Tao Zhang. StrandPairsViewer a toolkit for visualization and analysis of amino acids pairs in protein sheet structures.
113
Bioinformatics and Biomedical Engineering, 2009. ICBBE 2009. 3rd International Conference on 11-13 June 2009 Page(s): 1 - 4.DOI 10.1109/ICBBE.2009.5163427
3
[g] Shan Gao, Ning Zhang, You G Duan, Tao Zhang. Identifying non-neutral amino acid substitutions by SVMs. [h] 24 [i] . Visual Basic 5722-5725. . . 2010 2009 25 17. . 2009 30
2
[j] Shan Gao, Ning Zhang, You G Duan, Zhuo Yang, Ji S Ruan, Tao Zhang. HERG A Model to
Describe, Interoperate and Study molecular biology Databases, 2010: under review. [k] Guangyou Duan, Ning Zhang, Shan Gao, Jishou Ruan, Tao Zhang. Improved splice site prediction using sequence information and singular value decomposition.Journal of Theoretical Biology, 2010: under review.
114