You are on page 1of 3

WORKSHOP

DATA MINING IN GENOMICS

Diego A. Forero, MD, PhD(c)


Applied Molecular Genomics Group, VIB Department of Molecular Genetics, University of Antwerp, Antwerp, Belgium
VIB Laboratory of Developmental Genetics, Catholic University of Leuven, Leuven, Belgium
Unit of Functional Genomics, University of Liege, Liege, Belgium
Grupo de Neurociencias, Facultad de Medicina e Instituto de Genética, Universidad Nacional de Colombia, Bogotá, Colombia
Editor, hum-molgen.org
Email: daforerog@gmail.com
Website: http://www.daforerog.co.cc/

Data mining in genomics involves several interesting and useful approaches to extract relevant information
from large biological datasets. This workshop is mainly oriented to students and researchers with
backgrounds in biological sciences; people with backgrounds in computer sciences and interested in biology
are also welcome. Students are invited to bring their own biological questions of interest to explore in the
exercises.

A special emphasis will be given to several freely available tools that have been created in recent years with
the objective of helping experimentalists (with little programming experience) in the mining of large
biological datasets. These datasets can include those ones available in public repositories and databases. A
short discussion of extensions for bioinformatics of classical programming languages will also be presented.

A brief description of each one of the topics will be followed by extensive hands-on exercises in the use of
these tools.

Number of students: Between 10 and 20.


Duration: 2 days of physical interaction and several days of previous preparation reading the suggested
bibliographic material.
Cost: Free for selected students. Selection will be based on academic background and a strong interest in
genomics.
Level: Undergraduate students, MSc students, PhD students.

Given the duration and approach of the course, the main goal expected is: Introduction of advanced tools for
data mining in genomics at an intermediate level. And it is not expected: To teach what genomics is or to
transfer an expertise level on data mining in genomics.

It is expected that the students will read all the selected papers before the workshop, in order to be able to
focus in the advanced and practical aspects of these topics in an interactive and constructive way.

Elements:
-PC with connection to internet for each student
-Selected literature

Day 1.
BRIEF INTRODUCTION
-Generation of data vs analysis of data
-Current trends in data mining in biology

BIOMART AND ADVANCED FEATURES OF THE ENSEMBL GENOME BROWSER


-Types of data available at Biomart
-Retrieval of data from Biomart
-Data available at Ensembl Browser
-Practical Exercises

GALAXY AND ADVANCED FEATURES OF THE UCSC GENOME BROWSER


-Advanced use of the tables feature of UCSC browser
-Creation of user-defined tracks in the UCSC browser
-Practical Exercises

TAVERNA, MY GRID AND MY EXPERIMENT


-Available services to use with Taverna
-Creation and execution of bioinformatics workflows
-Use of available workflows from My Experiment
-Practical Exercises

Day 2.
iTOOLS, BIOWEKA AND BIOMOBY
-Features of iTools
-Features of BioWeka
-Features of BioMOBY
-Practical Exercises.

DATA MINING OF GENOME-WIDE EXPRESSION STUDIES


-Retrieval of data from NCBI GEO
-Retrieval of data from ArrayExpress
-General features of Bioconductor
-Practical Exercises

DATA MINING OF THE SCIENTIFIC LITERATURE IN GENOMICS


-Tools for advanced mining and retrieval of literature in genomics
-Practical Exercises

BIOPYTHON, BIOPERL AND BIOJAVA


-General features of BioPython
-General features of BioPerl
-General features of BioJava
-Practical Exercises in Python and Biopython.

PERSPECTIVES FOR DATA MINING AND GENOMICS


-The future of -omics approaches
-Future of data mining in genomics

-EXTENDED SESSION OF SUPERVISED EXERCISES

Selected References
Links to the full text of the papers are available at:
http://www.daforerog.co.cc/insilico.htm
-Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and
consensus. Nat Rev Genet. 2006 Jan;7(1):55-65.
-Barabási AL, Oltvai ZN. Network biology: understanding the cell's functional organization. Nat Rev Genet.
2004 Feb;5(2):101-13.
-Bassi S. A primer on python for life science researchers. PLoS Comput Biol. 2007 Nov;3(11):e199.
-Brazma A, Krestyaninova M, Sarkans U. Standards for systems biology. Nat Rev Genet. 2006
Aug;7(8):593-605.
-Dinov ID, Rubin D, Lorensen W, Dugan J, Ma J, Murphy S, Kirschner B, Bug W, Sherman M, Floratos A,
Kennedy D, Jagadish HV, Schmidt J, Athey B, Califano A, Musen M, Altman R, Kikinis R, Kohane I, Delp
S, Parker DS, Toga AW. iTools: a framework for classification, categorization and integration of
computational biology resources. PLoS ONE. 2008 May 28;3(5):e2265.
-Fernández-Suárez XM, Birney E. Advanced genomic data mining. PLoS Comput Biol. 2008 Sep
26;4(9):e1000121.
-Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I,
Taylor J, Miller W, Kent WJ, Nekrutenko A. Galaxy: a platform for interactive large-scale genome analysis.
Genome Res. 2005 Oct;15(10):1451-5.
-Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J,
Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G,
Smith C, Smyth G, Tierney L, Yang JY, Zhang J. Bioconductor: open software development for
computational biology and bioinformatics. Genome Biol. 2004;5(10):R80.
-Gewehr JE, Szugat M, Zimmer R. BioWeka--extending the Weka framework for bioinformatics.
Bioinformatics. 2007 Mar 1;23(5):651-3.
-Hartwell LH, Hopfield JJ, Leibler S, Murray AW. From molecular to modular cell biology. Nature. 1999
Dec 2;402(6761 Suppl):C47-52.
-Holland RC, Down TA, Pocock M, Prlic A, Huen D, James K, Foisy S, Dräger A, Yates A, Heuer M,
Schreiber MJ. BioJava: an open-source framework for bioinformatics. Bioinformatics. 2008 Sep
15;24(18):2096-7.
-Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using
DAVID bioinformatics resources. Nat Protoc. 2009;4(1):44-57.
-Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T. Taverna: a tool for building and
running workflows of services. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W729-32.
-Jensen LJ, Saric J, Bork P. Literature mining for the biologist: from information retrieval to biological
discovery. Nat Rev Genet. 2006 Feb;7(2):119-29.
-Kitano H. Computational systems biology. Nature. 2002 Nov 14;420(6912):206-10.
-Lee JK, Williams PD, Cheon S. Data mining in genomics. Clin Lab Med. 2008 Mar;28(1):145-66.
-Lee GW, Kim S. Genome data mining for everyone. BMB Rep. 2008 Nov 30;41(11):757-64.
-Moore JH. Bioinformatics. J Cell Physiol. 2007 Nov;213(2):365-9.
-Schattner P. Automated querying of genome databases. PLoS Comput Biol. 2007 Jan 26;3(1):e1.
-Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp
H, Lehväslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD,
Stupka E, Wilkinson MD, Birney E. The Bioperl toolkit: Perl modules for the life sciences. Genome Res.
2002 Oct;12(10):1611-8.
-Wilkinson MD, Links M. BioMOBY: an open source biological web services proposal. Brief Bioinform.
2002 Dec;3(4):331-41.

Current Version: March 20, 2009

You might also like