Have a look at Fig. 1 and try the following queries in order to see the the rela- tions in SCOP and the content of the three tables you will use in the exercises (cla, des, astral ) mysql> SELECT * FROM cla LIMIT 1; +---------+--------+---------+-------+-------+-------+-------+-------+-------+-------+ | sid | pdb_id | sccs | cl | cf | sf | fa | dm | sp | px | +---------+--------+---------+-------+-------+-------+-------+-------+-------+-------+ | d1dlwa_ | 1dlw | a.1.1.1 | 46456 | 46457 | 46458 | 46459 | 46460 | 46461 | 14982 | +---------+--------+---------+-------+-------+-------+-------+-------+-------+-------+ mysql> SELECT * FROM des LIMIT 1; +-------+------+------+------+--------------------+ | id | type | sccs | sid | description | +-------+------+------+------+--------------------+ | 46456 | cl | a | - | All alpha proteins | +-------+------+------+------+--------------------+ mysql> SELECT * FROM astral LIMIT 1; +---------+---------+-----------------------------------------------------------+ | sid | sccs | seq | +---------+---------+-----------------------------------------------------------+ | d1dlwa_ | a.1.1.1 | slfeqlggqaavqavtaqfyaniqadatvatffngidmpnqtnktaaflcaalgg...| +---------+---------+-----------------------------------------------------------+ 1 Figure 1: Entity relationship diagram for SCOP. You will use the tables cla, des and astral. Each row in the cla table contains one unique domain with pointers to the PDB structure it is coming from (cla.pdb id), the species it belongs to (cla.sp), its class (cla.cl ), fold (cla.cf ), superfamily (cla.sf ), family (cla.fa), domain (cla.dm). Each entry has a unique ID for the domain (cla.px). There is also an ID for any SCOP entry called cla.sid. Another ID is cla.sccs, which points to the family of the domain. In contrast to cla.fa, which is a number, cla.sccs is a string, such as a.4.5.1 for example. From the sccs one can see also the class immediately (a=all alpha, b=all beta, etc.). However, it is more ecient to use the cla.fa ID for families, as the database is faster processing numbers than strings. For large queries this makes a dierence. Each row in des contains a string describing the entry. The attribute des.description contains this description, des.type is a two-character string indicating which type the entry has (cl, cf, sf, fa, dm, px, sp). The attribute des.id is an ID that links the des entries to the cla entries cla.cl, cla.cf, cla.sf, cla.fa, cla.dm, cla.px, cla.sp. For example, the species ID cla.sp=46475 in the cla table occurs in the des table as des.id=46475 and has the description Human (Homo sapiens). Each des entry has also a des.sccs the family sccs, which also occurs in cla and des.sid which is not always used. There are three attributes in the astral table: astral.sid (the ID pointing to the corresponding entries cla.sid and des.sid), astral.seq (a string with the entries amino acid sequence), and astral.sccs (pointing to the entrys family SCCS). (image by Boris Vassilev) 2