You are on page 1of 63

Advanced Software Development Workshop 2009

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Lecture 10: Introduction to MySQL


Amy Ticoll 9:00-11:00 8 Dec 2009 S1A Computer Room, Level 4

With funding from:

Advanced Software Development Workshop 2009- 1

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Objectives 1. 2. 3. Learn the basics of relational databases Learn how to use MySQL Learn how to use the Structured Query Language (SQL)

Advanced Software Development Workshop 2009- 2

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Outline

Why are databases important in bioinformatics? Brief background in databases Introduction to the Structured Query Language A worked example a Sequence database in MySQL

Advanced Software Development Workshop 2009- 3

What is a database?

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Collection of information
Spreadsheet Filing cabinet Oracle database

Biology is abound with collections of data


Tsunami, deluge, avalanche, flood

Databases help us efficiently organise, integrate and query data in order to make scientific inferences

http://bioteach.ubc.ca

Advanced Software Development Workshop 2009- 5

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Databases and bioinformatics (2004 data) Nucleotide records Protein sequences 3D structures Interactions & complexes Human Unigene Cluster Maps and Complete Genomes Different taxonomy Nodes Human dbSNP Human RefSeq records bp in Human Contigs > 5,000 kb (116) PubMed records OMIM records 36,653,899 4,436,362 19,640 52,385 118,517 6,948 283,121 13,179,601 22,079 2,487,920,000 12,570,540 15,138

Advanced Software Development Workshop 2009- 6

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Molecular biology needs databases!

High volume + complex data structures =

HELP!

Advanced Software Development Workshop 2009- 7

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

RELATIONAL DATABASES

Advanced Software Development Workshop 2009- 8

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Relational Databases

A brief history
Developed by E.F. Codd (IBM) 1969-70
Died 2003

Awarded the Turing prize for his work (Computer Science equivalent of Nobel Prize) Developed 12 rules to define a RD that call for a language to define, manipulate and query the data in the database 1 rule led to the Structured Query Language (SQL) that is used in every RDMBS system on the market
ANSI standard (92,99)
Advanced Software Development Workshop 2009- 9

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

SQL

Advanced Software Development Workshop 2009- 10

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Relational Model

All data stored in tables Table is a relation made up of columns (fields) and rows (records) Intersection of a column and a row is a typed value Integer, Real, Varchar, Text, Blob, etc Operations on tables produce tables

Advanced Software Development Workshop 2009- 11

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Advantages of the relational model

Data independence Shielding the data from the application Efficiency Storage, retrieval, integration Data integrity/security Constraints, access controls

Advanced Software Development Workshop 2009- 12

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

ACID test
In computer science, ACID (atomicity, consistency, isolation, durability) is a set of properties that guarantee that database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction. An example of a transaction is a transfer of funds from one bank account to another, even though it might consist of multiple individual operations (such as debiting one account and crediting another).

Atomicity
all or nothing transaction If one operation fails, all fail

Consistency
data integrity constraints

Isolation
Every transaction has a consistent view of the database regardless of what other transactions are being processes

Durability
Once a transaction is complete, the newly updated data will survive failures of any kind logs
Advanced Software Development Workshop 2009- 13

Research fuelled by corporate databases gives us great technology for biological science
30+ years of research into robust systems Industry standards for databases Vendors committed to high-quality products
Oracle, DB2, Sybase, MS SQLserver, etc

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Emergence of the internet and database driven webcontent set the stage for bioinformatics Data mining tools for creating statistical associations
Diapers and beer? Teradata, a division of NCR Corporation

Advanced Software Development Workshop 2009- 14

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

What drives a database?

SQL
Advanced Software Development Workshop 2009- 15

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

SQL

Structured Query Language (ANSI 92,99)


Used in virtually every RDBMS product

Has operations for:


Creating tables Modifying tables Relating tables Inserting data Updating data Retrieving sets of data Deleting sets of data Deleting tables

Advanced Software Development Workshop 2009- 16

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

SQL

Not all implementations consistent WARNING:


MySQL CREATE TABLE statements != PostgreSQL CREATE TABLE statements

Advanced Software Development Workshop 2009- 17

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Commercial RDBMS

Oracle According to Forbes, Larry Ellison is the 9th richest person in the US ($18 billion) DB2 IBMs solution free for academics Microsoft SQL server For Windows

Advanced Software Development Workshop 2009- 18

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Open Source RDBMS

PostgreSQL
http://www.postgresql.org/ the worlds most advanced Open Source database software Began in 1986 at UC Berkeley For many years considered the most sophisticated OS RDBMS Performance? Comes with most Linux distros Small but loyal user community

Advanced Software Development Workshop 2009- 19

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

MySQL

http://www.mysql.com/ The world's most popular open source database


> 5,000,000 active installations

Easy to use Very fast retrieval due to architecture


Considered by many to be a toy database For years no row-level locking Did not handle transactions well

Advanced Software Development Workshop 2009- 20

MySQL
Free
As in free beer

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Dual license Commercial: http://www.mysql.com/products/licensing/commercial-license.html


OpenSource: http://www.mysql.com/products/licensing/opensource-license.html

As in free speech

Fast
Extremely fast reads for certain table types Outperforms any RDMBS for reads

Functional
Ease of use APIs in Perl, C, C++, Java Client/server architecture Works well with Apache/PHP for very popular OS dynamic web solution

Advanced Software Development Workshop 2009- 21

MySQL examples in bioinformatics


Ensembl (http://www.ensembl.org) Automated eukaryotic annotation database Gene Ontology (http://www.geneontology.org) Controlled vocabulary for genes and functions UCSC Genome Browser (http://genome.ucsc.edu) Human and other genome browser

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Free, fast and functional have made MySQL pervasive in bioinformatics:

BASE (http://base.thep.lu.se) BioArray Software Environment a web-based database solution for microarrays

Advanced Software Development Workshop 2009- 22

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Worked example a relational model for sequences and features


Create a relational model Tables to store: data Sequence strings Meta-data Data about the data features and their locations Insert some records Query the data to pull out useful subsets

Advanced Software Development Workshop 2009- 23

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Creating a Relational Database

Start with a data set Divide data set into records


The data

Divide records into useful fields that describe the particular record
The meta-data

Create a model based on the useful fields Create a database from the model Insert the data into the database The data is now computable

Advanced Software Development Workshop 2009- 24

LOCUS DEFINITION

MECHANOBIOLOGY ACCESSION Example Genbank sequence record

gene

Data

YSCITRSA2 2075 bp DNA linear PLN 26-APR-2004 Saccharomyces cerevisiae isoleucyl tRNA synthetase (LAF1) gene, partial cds; and unknown gene. ACCESSION L32174 VERSION L32174.1 GI:46561769 KEYWORDS . SOURCE Saccharomyces cerevisiae (baker's yeast) ORGANISM Saccharomyces cerevisiae Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes; Saccharomycetales; Saccharomycetaceae; Saccharomyces. REFERENCE 1 (bases 1 to 2075) AUTHORS Chen,E. and Bretscher,A.P. TITLE The LAF1 open reading frame encodes a second isoleucyl tRNA synthetase in the yeast Saccharomyces cerevisiae JOURNAL Unpublished FEATURES Location/Qualifiers source 1..2075 /organism="Saccharomyces cerevisiae" /mol_type="genomic DNA" /db_xref="taxon:4932" gene <1..1204 /gene="LAF1" CDS <1..1204 /gene="LAF1" /note="disruption results in an abnormal actin cytoskeleton; putative" /codon_start=2 /product="isoleucyl tRNA synthetase" /protein_id="AAT01099.1" /db_xref="GI:46561770" /translation="SLKLSKLPSPLYQVCLEGSDQHRGWFQSSLLTKVASSNVPVAPY EEVITHGFTLDENGLKMSKSVGNTISPEAIIRGDENLGLPALGVVGLRYLIAHSNFTT DIVAGPTVMKHVGEALKKVRTNFRYLLSNLQKSQDFNLLPIEQLRRVDQYTLYKINEL LETTREHYQKYNFSKVLITLQYHLNNELSAFYFDISKDILYSNQISWSWQEGRSNNAC PYTNAYRAILAPILPVMVQEVWKYIPEGWLQGQEHIDINPMRGKWPFLDSNTEIVTSF ENFELKILKQFQEEFKRLSLEEGVTKTTHSHVTIFTKHHLPFSSDELCDILQSSAVDI LQMDSNNNSHPTIELGRGINVQILVNVQILVERSKRHNCPRCWKANSAEEDKLCDRCK EAVDHLMS" CDS 1452..2075 /note="putative" /codon_start=1 /product="unknown" /protein_id="AAT01100.1" /db_xref="GI:46561771" /translation="MTVMNLFFRPCQLQMGSGPLELMLKRPTQLTTFMNTRPGGSTQI RFISGNLDPVKRREDRLRKIFSKSRLLTRLNKNPKFSHYFDRLSEAGTVPTLTSFFIL HEVTANTTTVLLWWLLYNLDLSDDFKLPNFLNGLMDSCHTAMEKFVGKRYQECLNKNK LILSGTVAYVTVKLLYPVRIFISIWGAPYFGKWLLLPFQKLKHLIKK" ORIGIN 1 aagcttaaag ttgtcaaaac tcccatcccc cctgtaccaa gtttgtctag aaggatctga 61 tcaacataga ggatggtttc aaagttcact gctaacaaaa gtagcatcaa gtaatgtccc 121 tgttgcacca tatgaagaag tgattactca tggttttacc ctagatgaga atggtctgaa 181 aatgtcaaaa tctgtgggaa atacaatttc tcccgaagca ataattcgag gcgatgaaaa 241 cttaggctta ccagctttgg gtgttgtagg cttgaggtat ctgatagcac attcgaattt 301 cacaactgat atagttgctg gcccgactgt gatgaaacat gtaggagaag ctctaaaaaa 361 ggttaggact aactttcgct atttattgag taatttacag aagtcccaag atttcaacct 421 tttgccgatt gaacaattac gccgtgttga tcaatatacc ttgtataaga taaacgaact 481 gctggaaacg acgagagaac actaccaaaa gtacaacttt tccaaggttc tcattactct 541 acaatatcat ttaaataacg agctatcggc gttttatttt gatatctcaa aggatatttt 601 atattccaac caaatatctt ggtcatggca agaaggcagg tcaaacaacg cttgtccata 661 tactaatgca tatagggcaa ttcttgcacc aatattaccc gttatggtcc aagaagtatg 721 gaagtatata ccagaaggat ggttacaagg acaagaacat atagacatta atccgatgcg 781 tggaaaatgg ccgtttttgg actcaaatac ggaaatcgtc acctcctttg aaaactttg

2075 bp L32174

Mechanobiology Research Center of Excellence in 26-APR-2004Research Center of Excellence

<1..1204 /gene="LAF1"

Advanced Software Development Workshop 2009- 25

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Simple example: a relational model for biological sequences and features

Advanced Software Development Workshop 2009- 26

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

CREATE Sequence

CREATE TABLE Sequence ( sequence_id INT NOT NULL AUTO_INCREMENT, sequence LONGTEXT NOT NULL, defline TEXT, accession VARCHAR(255) NOT NULL, version INT DEFAULT 0, length INT DEFAULT 0, moltype INT NOT NULL, PRIMARY KEY(sequence_id) );

Advanced Software Development Workshop 2009- 27

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

CREATE Ontology

CREATE TABLE Ontology ( ontology_id INT NOT NULL AUTO_INCREMENT, term VARCHAR(255) NOT NULL, description TEXT NOT NULL, PRIMARY KEY (ontology_id) );

Advanced Software Development Workshop 2009- 28

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

CREATE Feature

CREATE TABLE Feature ( feature_id INT NOT NULL AUTO_INCREMENT, sequence_id INT NOT NULL, ontology_id INT NOT NULL, FOREIGN KEY (sequence_id) REFERENCES Sequence, FOREIGN KEY (ontology_id) REFERENCES Ontology, PRIMARY KEY(feature_id) );

Advanced Software Development Workshop 2009- 29

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

CREATE Location

CREATE TABLE Location ( location_id INT NOT NULL AUTO_INCREMENT, feature_id INT NOT NULL, start INT NOT NULL, stop INT NOT NULL, strand INT NOT NULL, FOREIGN KEY (feature_id) REFERENCES Feature, PRIMARY KEY(location_id) );

Advanced Software Development Workshop 2009- 30

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

CREATE Qualifier

CREATE TABLE Qualifier ( qualifier_id INT NOT NULL AUTO_INCREMENT, feature_id INT NOT NULL, ontology_id INT NOT NULL, value TEXT NOT NULL, FOREIGN KEY (feature_id) REFERENCES Feature, FOREIGN KEY (ontology_id) REFERENCES Ontology, PRIMARY KEY (qualifier_id) );

Advanced Software Development Workshop 2009- 31

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

INSERT an ontology

mysql> INSERT INTO Ontology (term, description) VALUES

-> ('start codon', 'denotes an Methionine codon of a transcript');


Query OK, 1 row affected (0.00 sec) mysql> SELECT * FROM Ontology; +-------------+-------------+---------------------------------------------+ | ontology_id | term | description | +-------------+-------------+---------------------------------------------+ | 3 | start codon | denotes an Methionine codon of a transcript | +-------------+-------------+---------------------------------------------+ 1 row in set (0.01 sec)

Advanced Software Development Workshop 2009- 32

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

INSERT some more ontologies

mysql>

INSERT INTO Ontology (term, description) VALUES -> ('exon', 'an exon in genomic sequence');

Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO Ontology (term, description) VALUES

-> ('exon type', '3\'UTR, initial, internal, terminal, 5\'UTR');


Query OK, 1 row affected (0.00 sec) mysql> SELECT * FROM Ontology; +-------------+-------------+---------------------------------------------+ | ontology_id | term | description | +-------------+-------------+---------------------------------------------+ | 3 | start codon | denotes an Methionine codon of a transcript | | 4 | exon | an exon in genomic sequence | | 5 | exon type | 3'UTR, initial, internal, terminal, 5'UTR | +-------------+-------------+---------------------------------------------+ 3 rows in set (0.00 sec)

Advanced Software Development Workshop 2009- 33

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

INSERT a sequence

mysql> DESC Sequence; +-------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------+--------------+------+-----+---------+----------------+ | sequence_id | int(11) | | PRI | NULL | auto_increment | | sequence | longtext | | | | | | defline | text | YES | | NULL | | | accession | varchar(255) | | | | | | version | int(11) | YES | | 0 | | | length | int(11) | YES | | 0 | | | moltype | int(11) | | | 0 | | +-------------+--------------+------+-----+---------+----------------+ 7 rows in set (0.00 sec) mysql> INSERT INTO Sequence (sequence, defline, accession, version, length, moltype) -> VALUES ('ATGACGATCAGCATCAGCTACAGCTG', '> seq1', 'seq1', 1, 26, 1); Query OK, 1 row affected (0.00 sec)

Advanced Software Development Workshop 2009- 34

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

INSERT a Feature on a sequence

mysql> SELECT * FROM Sequence; +-------------+----------------------------+---------+-----------+---------+--------+---------+ | sequence_id | sequence | defline | accession | version | length | moltype | +-------------+----------------------------+---------+-----------+---------+--------+---------+ | 2 | ATGACGATCAGCATCAGCTACAGCTG | > seq1 | seq1 | 1 | 26 | 1 | +-------------+----------------------------+---------+-----------+---------+--------+---------+ 1 row in set (0.03 sec) mysql> SELECT * FROM Ontology; +-------------+-------------+---------------------------------------------+ | ontology_id | term | description | +-------------+-------------+---------------------------------------------+ | 3 | start codon | denotes an Methionine codon of a transcript | | 4 | exon | an exon in genomic sequence | | 5 | exon type | 3'UTR, initial, internal, terminal, 5'UTR | +-------------+-------------+---------------------------------------------+ 3 rows in set (0.00 sec) mysql>

INSERT INTO Feature (sequence_id, ontology_id) -> VALUES (2, 3);

Query OK, 1 row affected (0.00 sec)

Advanced Software Development Workshop 2009- 35

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

INSERT a Location
mysql> SELECT * From Feature; +------------+-------------+-------------+ | feature_id | sequence_id | ontology_id | +------------+-------------+-------------+ | 1 | 2 | 3 | +------------+-------------+-------------+ 1 row in set (0.01 sec) mysql> DESC Location; +-------------+---------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------+---------+------+-----+---------+----------------+ | location_id | int(11) | | PRI | NULL | auto_increment | | feature_id | int(11) | | | 0 | | | start | int(11) | | | 0 | | | stop | int(11) | | | 0 | | | strand | int(11) | | | 0 | | +-------------+---------+------+-----+---------+----------------+ 5 rows in set (0.00 sec) mysql>

INSERT INTO Location (feature_id, start, stop, strand) -> VALUES(1,1,3,1);

Query OK, 1 row affected (0.02 sec)

Advanced Software Development Workshop 2009- 36

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Queries using SELECT


mysql> SELECT * FROM Sequence; +-------------+----------------------------+---------+-----------+---------+--------+---------+ | sequence_id | sequence | defline | accession | version | length | moltype | +-------------+----------------------------+---------+-----------+---------+--------+---------+ | 2 | ATGACGATCAGCATCAGCTACAGCTG | > seq1 | seq1 | 1 | 26 | 1 | | 3 | SLKLSKLPSPLYQVCLE | > seq2 | L32174 | 1 | 17 | 3 | +-------------+----------------------------+---------+-----------+---------+--------+---------+ 2 rows in set (0.00 sec) mysql> SELECT sequence FROM +----------------------------+ | sequence | +----------------------------+ | ATGACGATCAGCATCAGCTACAGCTG | +----------------------------+ 1 row in set (0.12 sec) mysql> SELECT length +--------+ | length | +--------+ | 17 | +--------+ 1 row in set (0.03 sec)

Sequence WHERE accession = 'seq1';

FROM Sequence WHERE sequence_id = 3;

Advanced Software Development Workshop 2009- 37

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Joining tables
mysql> SELECT * FROM Feature; +------------+-------------+-------------+ | feature_id | sequence_id | ontology_id | +------------+-------------+-------------+ | 1 | 2 | 3 | | 2 | 2 | 4 | | 3 | 2 | 4 | +------------+-------------+-------------+ 3 rows in set (0.04 sec)

Return me the descriptions of the features in the Feature table


mysql> SELECT feature_id, description -> FROM Feature, Ontology -> WHERE Feature.ontology_id = Ontology.ontology_id; +------------+---------------------------------------------+ | feature_id | description | +------------+---------------------------------------------+ | 1 | denotes an Methionine codon of a transcript | | 2 | an exon in genomic sequence | | 3 | an exon in genomic sequence | +------------+---------------------------------------------+ 3 rows in set (0.04 sec)

Advanced Software Development Workshop 2009- 38

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Setting up a complex query Consider sequence seq1 with the following features: Initial exon from 1..6 Internal exon from 15..20 Note that with relational model the term exon only appears once in the database

Advanced Software Development Workshop 2009- 39

mysql> SELECT * FROM Sequence WHERE sequence_id = 2; Mechanobiology Research Center of Excellence Research Center of Excellence in +-------------+----------------------------+---------+-----------+---------+--------+---------+ | sequence_id | sequence | defline | accession | version | length | moltype | MECHANOBIOLOGY +-------------+----------------------------+---------+-----------+---------+--------+---------+ | 2 | ATGACGATCAGCATCAGCTACAGCTG | > seq1 | seq1 | 1 | 26 | 1 | +-------------+----------------------------+---------+-----------+---------+--------+---------+ 1 row in set (0.04 sec)

Complex query

mysql> SELECT * FROM Feature WHERE sequence_id = 2; +------------+-------------+-------------+ | feature_id | sequence_id | ontology_id | +------------+-------------+-------------+ | 1 | 2 | 3 | | 2 | 2 | 4 | | 3 | 2 | 4 | +------------+-------------+-------------+ 3 rows in set (0.03 sec) mysql> SELECT * FROM Location; +-------------+------------+-------+------+--------+ | location_id | feature_id | start | stop | strand | +-------------+------------+-------+------+--------+ | 1 | 1 | 1 | 3 | 1 | | 2 | 2 | 1 | 6 | 1 | | 3 | 3 | 15 | 20 | 1 | +-------------+------------+-------+------+--------+ 3 rows in set (0.20 sec)

The relational model stores data efficiently and optimises the modifiablility of the data. What if exon changes to something else?

mysql> SELECT * FROM Ontology; +-------------+-------------+---------------------------------------------+ | ontology_id | term | description | +-------------+-------------+---------------------------------------------+ | 3 | start codon | denotes an Methionine codon of a transcript | | 4 | exon | an exon in genomic sequence | | 5 | exon type | 3'UTR, initial, internal, terminal, 5'UTR | +-------------+-------------+---------------------------------------------+ 3 rows in set (0.00 sec) mysql> SELECT * FROM Qualifier; +--------------+------------+-------------+----------+ | qualifier_id | feature_id | ontology_id | value | +--------------+------------+-------------+----------+ | 1 | 2 | 5 | initial | | 2 | 3 | 5 | internal |

40
Advanced Software Development Workshop 2009- 40

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Aggregate queries

mysql> SELECT * FROM Sequence; +-------------+----------------------------+---------+-----------+---------+--------+---------+ | sequence_id | sequence | defline | accession | version | length | moltype | +-------------+----------------------------+---------+-----------+---------+--------+---------+ | 2 | ATGACGATCAGCATCAGCTACAGCTG | > seq1 | seq1 | 1 | 26 | 1 | | 3 | SLKLSKLPSPLYQVCLE | > seq2 | L32174 | 1 | 17 | 3 | | 4 | MASQQQCGAR | > seq | seq3 | 1 | 10 | 3 | +-------------+----------------------------+---------+-----------+---------+--------+---------+ mysql> SELECT count(*), +----------+---------+ | count(*) | moltype | +----------+---------+ | 1 | 1 | | 2 | 3 | +----------+---------+ 2 rows in set (0.08 sec)

moltype from Sequence GROUP BY moltype;

Advanced Software Development Workshop 2009- 41

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Using LIMIT

mysql> +-------------+----------------------------+---------+-----------+---------+--------+---------+ | sequence_id | sequence | defline | accession | version | length | moltype | +-------------+----------------------------+---------+-----------+---------+--------+---------+ | 2 | ATGACGATCAGCATCAGCTACAGCTG | > seq1 | seq1 | 1 | 26 | 1 | | 3 | SLKLSKLPSPLYQVCLE | > seq2 | L32174 | 1 | 17 | 3 | +-------------+----------------------------+---------+-----------+---------+--------+---------+ 2 rows in set (0.08 sec)

SELECT * FROM Sequence LIMIT 2;

Advanced Software Development Workshop 2009- 42

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

UPDATING a table

mysql> SELECT * FROM Qualifier; +--------------+------------+-------------+----------+ | qualifier_id | feature_id | ontology_id | value | +--------------+------------+-------------+----------+ | 1 | 2 | 5 | initial | | 2 | 3 | 5 | internal | +--------------+------------+-------------+----------+ 2 rows in set (0.00 sec) mysql>

UPDATE Qualifier SET value = 'terminal' -> WHERE qualifier_id = 2;

Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0

Advanced Software Development Workshop 2009- 43

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

DELETING from a table

mysql>

DELETE FROM Qualifier -> WHERE qualifier_id = 2;

Query OK, 1 row affected (0.04 sec) mysql> SELECT * FROM Qualifier; +--------------+------------+-------------+---------+ | qualifier_id | feature_id | ontology_id | value | +--------------+------------+-------------+---------+ | 1 | 2 | 5 | initial | +--------------+------------+-------------+---------+ 1 row in set (0.03 sec)

Advanced Software Development Workshop 2009- 44

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY


mysql> DESC Ontology; +-------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------+--------------+------+-----+---------+----------------+ | ontology_id | int(11) | | PRI | NULL | auto_increment | | term | varchar(255) | | | | | | description | text | | | | | +-------------+--------------+------+-----+---------+----------------+ 3 rows in set (0.00 sec) mysql> INSERT INTO Qualifier (feature_id, ontology_id, value) -> (2, 5, 'initial'); ERROR 1064: You have an error in your SQL syntax. Check the manual that corresponds to your MySQL server version for the right syntax to use near '2, 5, 'initial')' at line 2 mysql> INSERT INTO Qualifier (feature_id, ontology_id, value) -> VALUES (2, 5, 'initial'); Query OK, 1 row affected (0.01 sec) mysql> INSERT INTO Qualifier (feature_id, ontology_id, value) -> VALUES (3, 5, 'internal'); Query OK, 1 row affected (0.00 sec)

Advanced Software Development Workshop 2009- 46

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Optimisation

Perking up MySQL
Queries Database server

Advanced Software Development Workshop 2009- 47

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Indexing

In general, indexing your data makes retrieval orders of magnitude faster Consider a list of 1000000 sequences with accession numbers You need to find the one sequence with accession number AC123456 Response time requires O(1000000) operations if the accession field is not indexed
Equivalent to scanning through a list

Response time requires O(log(1000000)) = O(6) operations if the accession field is indexed
Somewhat like a hashtable lookup

Advanced Software Development Workshop 2009- 48

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Types of indexes

PRIMARY KEY
To identify the main accessor field of the table

UNIQUE
Constraint to ensure that all entries in a field are different

INDEX
Creates a way to quickly search on a given field

FULLTEXT
For large TEXT fields > 255 characters

Compound indexes (column1, column2, ) NOTE index is synonymous with KEY

Advanced Software Development Workshop 2009- 49

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Drawbacks to indexing Need more disk space Can slow down inserts Know your data and the queries you will perform on the data Only index fields you think you will query on Requires spending time in the design phase to define requirements of the database

Advanced Software Development Workshop 2009- 50

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Creating an index

mysql> CREATE INDEX acindex ON Sequence (accession); Query OK, 1 row affected (0.18 sec) Records: 1 Duplicates: 0 Warnings: 0

Advanced Software Development Workshop 2009- 51

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Tuning the database > mysqladmin variables > mysqld --help

DBA

Advanced Software Development Workshop 2009- 52

Variables (--variable-name=value) and boolean options {FALSE|TRUE} Value (after reading options) --------------------------------- ----------------------------basedir /raid/db/mysql/mysql-max4.0.14-pc-linux-i686/ bdb-home (No default value) bdb-logdir (No default value) bdb-tmpdir (No default value) bind-address (No default value) console FALSE chroot (No default value) character-sets-dir /raid/db/mysql/mysql-max4.0.14-pc-linux-i686/share/mysql/charsets/ datadir /raid/db/mysql/mysql-max4.0.14-pc-linux-i686/data/ default-character-set latin1 enable-locking FALSE enable-pstack FALSE gdb FALSE innodb_data_home_dir (No default value) innodb_log_group_home_dir (No default value) innodb_log_arch_dir (No default value) innodb_flush_log_at_trx_commit 1 innodb_flush_method (No default value) innodb_fast_shutdown TRUE innodb_max_dirty_pages_pct 90 init-file (No default value) log (No default value) language /raid/db/mysql/mysql-max4.0.14-pc-linux-i686/share/mysql/english/ local-infile TRUE log-bin (No default value) log-bin-index (No default value) log-isam myisam.log log-update (No default value) log-slow-queries (No default value) log-slave-updates FALSE low-priority-updates FALSE master-host (No default value) master-user test master-port 3306

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Advanced Software Development Workshop 2009- 53

master-connect-retry 60 master-retry-count 86400 master-info-file master.info master-ssl FALSE master-ssl-key (No default value) master-ssl-cert (No default value) master-ssl-capath (No default value) master-ssl-cipher (No default value) myisam-recover OFF memlock FALSE disconnect-slave-event-count 0 abort-slave-event-count 0 max-binlog-dump-events 0 sporadic-binlog-dump-fail FALSE new FALSE old-protocol 10 old-rpl-compat FALSE pid-file /raid/db/mysql/mysql-max4.0.14-pc-linux-i686/data/watson.pid log-error port 3306 report-host (No default value) report-user (No default value) report-password (No default value) report-port 3306 rpl-recovery-rank 0 relay-log (No default value) relay-log-index (No default value) safe-user-create FALSE server-id 1 show-slave-auth-info FALSE concurrent-insert TRUE skip-grant-tables FALSE skip-slave-start FALSE relay-log-info-file relay-log.info slave-load-tmpdir /raid/tmp/ socket /tmp/mysql.sock sql-bin-update-same FALSE sql-mode OFF temp-pool TRUE tmpdir /raid/tmp

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Advanced Software Development Workshop 2009- 54

external-locking use-symbolic-links symbolic-links log-warnings warnings back_log bdb_cache_size bdb_log_buffer_size bdb_max_lock bdb_lock_max binlog_cache_size connect_timeout delayed_insert_timeout delayed_insert_limit delayed_queue_size flush_time ft_min_word_len ft_max_word_len ft_max_word_len_for_sort ft_stopword_file innodb_mirrored_log_groups innodb_log_files_in_group innodb_log_file_size innodb_log_buffer_size innodb_buffer_pool_size innodb_additional_mem_pool_size innodb_file_io_threads innodb_lock_wait_timeout innodb_thread_concurrency innodb_force_recovery interactive_timeout join_buffer_size key_buffer_size long_query_time lower_case_table_names max_allowed_packet max_binlog_cache_size max_binlog_size max_connections max_connect_errors max_delayed_threads max_heap_table_size

FALSE TRUE TRUE FALSE FALSE 50 8388600 0 10000 10000 32768 5 300 100 1000 0 4 254 20 (No default value) 1 2 5242880 1048576 8388608 1048576 4 50 8 0 28800 131072 402653184 10 FALSE 1047552 4294967295 1073741824 100 10 20 16777216

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Advanced Software Development Workshop 2009- 55

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY


max_join_size max_relay_log_size max_seeks_for_key max_sort_length max_tmp_tables max_user_connections max_write_lock_count bulk_insert_buffer_size myisam_block_size myisam_max_extra_sort_file_size myisam_max_sort_file_size myisam_repair_threads myisam_sort_buffer_size net_buffer_length net_retry_count net_read_timeout net_write_timeout open_files_limit query_cache_limit query_cache_size query_cache_type read_buffer_size read_rnd_buffer_size record_buffer relay_log_space_limit slave_compressed_protocol slave_net_timeout read-only slow_launch_time sort_buffer_size table_cache thread_concurrency thread_cache_size tmp_table_size thread_stack wait_timeout default-week-format 18446744073709551615 0 4294967295 1024 32 0 4294967295 8388608 1024 268435456 2147483647 1 67108864 16384 10 30 60 0 1048576 33554432 1 2093056 262144 2093056 0 FALSE 3600 FALSE 2 2097144 512 8 8 33554432 196608 28800 0

To see what values a running MySQL server is using, type 'mysqladmin variables' instead of 'mysqld --help'.

Advanced Software Development Workshop 2009- 56

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Tuning the system to your needs Need to think about uses of the database How many concurrent connections? Will there be large records? Will there be repetitive queries? Will I need large indexes? Tuning the system can give huge gains in performance lets you get the most out of the system

Advanced Software Development Workshop 2009- 57

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Important parameters

max_allowed_packet
Largest amount of data to be transmitted to the client in 1 packet

max_connections
The largest number of concurrent connections to the database server

datadir
The location of the data files on the system

query_cache
Size of cache for repetitive queries

Many, many others..

Advanced Software Development Workshop 2009- 58

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Advanced Software Development Workshop 2009- 59

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

COMMUNICATING WITH MySQL

Advanced Software Development Workshop 2009- 60

Communicating with

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY MySQL

Through a GUI MySQL ControlCentre


http://www.mysql.com/products/mysqlcc/ Standalone application supported by MySQL

Through the web PhpMyAdmin


http://www.phpmyadmin.net/home_page/ Works with Apache web server

Through the Unix command line MySQL client Comes with MySQL Through APIs (Application Programming Interface) MySQL C API Perl DBI MySQL++ (C++)
http://dev.mysql.com/downloads/other/plusplus/

JDBC (Java Database Connectivity)


Java protocol and API for RDBMS communication

Advanced Software Development Workshop 2009- 61

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Communicating with MySQL

Choose the method that is right for the job Administration MySQL CC PHP MyAdmin Standalone Application APIs Web Application PHP/Java servlets Low throughput queries Command line client

Advanced Software Development Workshop 2009- 62

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Topics not covered MySQL tools mysqldump Tool to dump a schema, all the data and/or both mysqlimport Tool to import delimited files Look before you parse! mysqladmin For DBAs to create database, change passwords, etc Read the mysql documentation

Advanced Software Development Workshop 2009- 63

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Summary

Relational databases are necessary in bioinformatics Relational databases allow us to efficiently store and query large amounts of data MySQL is a good choice for RDBMS engine because it is highly functional at no cost

Advanced Software Development Workshop 2009- 64

Mechanobiology Research Center of Excellence Research Center of Excellence in MECHANOBIOLOGY

Resources

MySQL
http://www.mysql.com http://dev.mysql.com.mysql/en/index.html http://www.mysql.com/products/mysqlcc/ http://dev.mysql.com/doc/connector/j/en

NAR Database Issue 2004


http://nar.oupjournals.org/content/vol32/suppl_1

Advanced Software Development Workshop 2009- 65

You might also like