Professional Documents
Culture Documents
1 97 170
Multiple-segment scan
A difficult test
Average domain size ~80 residues
Distribution of families in folds
(from the point of view of structure space)
The 9
superfolds
The big
majority of
sequences are
superfolds (Orengo 2005)
Why superfolds?
Branden & Tooze 1991 Introduction to protein structure. Garland publishing, New York
Russel et al. 1998 J. Mol. Biol. 282, 903-918
Why supersites?
¾Helix dipoles
¾Main-chain atoms
¾Proximity of loops
¾Number of loops
How many folds? (Chothia 1992)
PDB 120 folds
(structure database)
25% of sequences
with > 25% IDE
SwissProt ~ 500 folds
(sequence database) (400)
30% of sequences
with > 25% IDE
~ 1500 folds
Genomes
(1000)
Coulson & Moult 2002 Proteins Str. Funct. Gen. 46, 61-71
Structures that map into the same vector are called DEGENERATE
57,337 structures 30,408 strings 18,213 nondegenerate structures
Random sample sequence space: 20x106 sequences
Exhaustive search of conformational space
Few structures with high designability