You are on page 1of 22

Continuous and discontinuous domains

What are possible parameters for domain definition/detection?


The split value

Principle: residues comprising a domain make more contacts


between themselves (internal contacts) than to the rest of the
protein (external contacts).

Split value: (intA/extAB)×(intB/extAB)

(Siddiqui & Barton 1995 Prot. Sci. 4, 872-884)


An example of split value profile

1 97 170
Multiple-segment scan
A difficult test
Average domain size ~80 residues
Distribution of families in folds
(from the point of view of structure space)

The big majority of folds are unifolds


Distribution of families in folds
(from the point of view of sequence space)

The 9
superfolds

The big
majority of
sequences are
superfolds (Orengo 2005)
Why superfolds?

¾Regularity => folding


¾Regularity => evolution
¾Sliding => mutational stability
¾Physico-chemical constraints

„Fold space attractors“


Supersites in superfolds

Branden & Tooze 1991 Introduction to protein structure. Garland publishing, New York
Russel et al. 1998 J. Mol. Biol. 282, 903-918
Why supersites?

¾Helix dipoles
¾Main-chain atoms
¾Proximity of loops
¾Number of loops
How many folds? (Chothia 1992)
PDB 120 folds
(structure database)
25% of sequences
with > 25% IDE
SwissProt ~ 500 folds
(sequence database) (400)
30% of sequences
with > 25% IDE
~ 1500 folds
Genomes
(1000)

(correction for ~ 80% efficiency in


detection of structural homology)

Chothia 1992 Nature 357, 543-544


A re-estimation of the total n° of folds
Current databases
Consider skewed distribution of families in folds

Coulson & Moult 2002 Proteins Str. Funct. Gen. 46, 61-71

Wolf et al estimate 1000 folds (2000 J. Mol. Biol. 299, 897-905)


Rate of discovery of new folds

(one point per year)


Are protein structures atypical?
The concept of disegnability

„The designability of a structure is defined as the number of


sequences that possess that structure as their nondegenerate
ground state.“

=> Natural structures are highly designable


(Strong degeneration of the structural code, i.e. many different
sequences can have the same structure. See in particular
example superfolds)

Which structural features correlate with high designability?

(PNAS 1998, 95, 4987-4990)


Ideally, I would need to know...
Structure space
Sequence space
Energy function

Simplified lattice models to allow for exhaustive search

Explore sequence and structure space on 6x6 lattice


Sequence vector h = (hσ1, hσ2,...hσN) (hydrophobicity)
Structure vector s = (s1, s2,...sN) (burial)
E = -Σi sihσi
Simplify:
si = 0 (surface); si = 1 (buried)
hi = 0 (P); hi = 1 (H)
Structure vectors on 6x6 lattice

Structures that map into the same vector are called DEGENERATE
57,337 structures 30,408 strings 18,213 nondegenerate structures
Random sample sequence space: 20x106 sequences
Exhaustive search of conformational space
Few structures with high designability

Low designability High designability

Clustered structure Isolated structure


„Random-walk“ like Geometric regularity
Highly designable structures are
not compatible with local changes
Highly designable structures share
features of natural structures

¾Resistance toward sequence changes (designability)


¾Thermodynamic stability (not degenerated)
¾Regular structural motives (secondary structure)
¾Many surface-to-core transitions
¾High contact order
¾Difficult to transform by local changes (isolated in structure space)
Contact order

(Plaxco et al. 1998, JMB 277, 985-994)

You might also like