You are on page 1of 2

Introduction of SCOP

[Lab 3, Supplementary material]


Have a look at Fig. 1 and try the following queries in order to see the the rela-
tions in SCOP and the content of the three tables you will use in the exercises
(cla, des, astral )
mysql> SELECT * FROM cla LIMIT 1;
+---------+--------+---------+-------+-------+-------+-------+-------+-------+-------+
| sid | pdb_id | sccs | cl | cf | sf | fa | dm | sp | px |
+---------+--------+---------+-------+-------+-------+-------+-------+-------+-------+
| d1dlwa_ | 1dlw | a.1.1.1 | 46456 | 46457 | 46458 | 46459 | 46460 | 46461 | 14982 |
+---------+--------+---------+-------+-------+-------+-------+-------+-------+-------+
mysql> SELECT * FROM des LIMIT 1;
+-------+------+------+------+--------------------+
| id | type | sccs | sid | description |
+-------+------+------+------+--------------------+
| 46456 | cl | a | - | All alpha proteins |
+-------+------+------+------+--------------------+
mysql> SELECT * FROM astral LIMIT 1;
+---------+---------+-----------------------------------------------------------+
| sid | sccs | seq |
+---------+---------+-----------------------------------------------------------+
| d1dlwa_ | a.1.1.1 | slfeqlggqaavqavtaqfyaniqadatvatffngidmpnqtnktaaflcaalgg...|
+---------+---------+-----------------------------------------------------------+
1
Figure 1: Entity relationship diagram for SCOP. You will use the tables cla, des and astral. Each row in the cla table
contains one unique domain with pointers to the PDB structure it is coming from (cla.pdb id), the species it belongs to (cla.sp),
its class (cla.cl ), fold (cla.cf ), superfamily (cla.sf ), family (cla.fa), domain (cla.dm). Each entry has a unique ID for the domain
(cla.px). There is also an ID for any SCOP entry called cla.sid. Another ID is cla.sccs, which points to the family of the
domain. In contrast to cla.fa, which is a number, cla.sccs is a string, such as a.4.5.1 for example. From the sccs one can see
also the class immediately (a=all alpha, b=all beta, etc.). However, it is more ecient to use the cla.fa ID for families, as
the database is faster processing numbers than strings. For large queries this makes a dierence. Each row in des contains a
string describing the entry. The attribute des.description contains this description, des.type is a two-character string indicating
which type the entry has (cl, cf, sf, fa, dm, px, sp). The attribute des.id is an ID that links the des entries to the cla entries
cla.cl, cla.cf, cla.sf, cla.fa, cla.dm, cla.px, cla.sp. For example, the species ID cla.sp=46475 in the cla table occurs in the des
table as des.id=46475 and has the description Human (Homo sapiens). Each des entry has also a des.sccs the family sccs,
which also occurs in cla and des.sid which is not always used. There are three attributes in the astral table: astral.sid (the
ID pointing to the corresponding entries cla.sid and des.sid), astral.seq (a string with the entries amino acid sequence), and
astral.sccs (pointing to the entrys family SCCS). (image by Boris Vassilev)
2

You might also like