You are on page 1of 5

COMPUTATIONAL MOLECULAR BIOLOGY

A Two Semester Course


at
UFC

2007

Instructor: Basilis Gidas


Brown University
USA

Location and Time:


We will have an organizational meeting on Monday March 12, 14:00 horas, sala dos
seminarios, premiero andar no Departamento de Matematica, Bloco 914. In this
meeting we will decide the day and time of the class. The class will meet one day a week for
three hours with an interval of 15 minutes.

Contact: Basilis Gidas


Sala 01, Departamento de Matematica, Bloco 914
fone: 085/96030663, e-mail: gidas@mat.ufc.br, gidas@dam.brown.edu

Course Description
With the availability of genomic, expression (microarray), ChIp-chip, structural, mass spectrom-
etry, and other biological data, modern molecular biology depends increasingly on detailed math-
ematical modeling, and mathematical/computer science/ statistical tools and algorithms for the
analysis and interpretation of the data. Basic objectives of molecular biology include: (a) Ge-
nomics: sequencing and comparing the genomes of different species, finding the genes,discovering
transcription factors motifs and other patterns in a genome, and constructing phylogenetic trees,
(b)Functional Genomics: microarray data analysis and clustering of genes and experiments, reg-
ulation of gene transcription and other biological processes, determining gene networks on the
basis of microarray data, ChIp-chip data, and cis-regulatory modules information, (c)Proteomics:
determining protein identification and signal transduction pathways on the basis of tandem mass
spectrometry and immunoprecipitation information, (c) Structural Proteomics: determining the
structure and function of proteins, understanding protein complexes and DNA-protein com-
plexes, ligand-receptor coupling, docking, and drug design, (d) RNA structure: predicting RNA
secondary structure.
The study of these problems has involved mathematical tools, models, and algorithms from
Bayesian Statistics, Learning Theory and Data Mining, Decision Theory, Computer Science,
and even Algebraic Geometry. The methods include: Hidden Markov Models, Probabilistic
Graphical Models, Context-Free and other grammars, dynamic programic, Molecular Dynamics
and Modeling, Bayesian likelihood, Monte Carlo optimization and simulation algorithms, classi-
fication and regression trees, support vector machines, clustering algorithms, the EM algorithm,
and graph theory.
The main goal of the two semester course will be to provide a self-contained exposition of
the main mathematical/probabilistic techniques in computational molecular biology and bioin-
formatics, together with current of the art applications of these techniques. The main topics
for the course are listed below (SYLLABUS) under two heading: I. Molecular Biology Tasks
and Problems and II. Mathematical/Statistical/Computational Tools. The mathematical tools
and their biological applications will not be separated, but they will be mixed as we go along.
The course will begin for example with a presentation of the basic concepts and problems in
biochemistry and biology, and an outline of the underpinning mathematical tools.

Textbooks and Reference:


1. Warren J. Ewens and Gregory R. Grant:Statistical Methods in Bioinformatics, Springer Ver-
lag, 2006
2. Lior Pachter and Bernd Sturmfels: Algebraic Statistics for Computational Biology, Cambridge
University Press, 2005
3. Richard C. Deonier, Simon Tavare, and Michael S. Waterman: Computational Genome Anal-
ysis, Springer, 2005
4. Joao Setubal and Joao Meidanis: Introduction to Computational Molecular Biology, PWS
Publishing Company, 1997

None of these books is adequate for the course. Their material will be supplemented by lecture
notes I have written at Brown.
SYLLABUS
(The material below is for both semesters. They are separated into the biological issues and the
mathematical/computational tools for convenience. The lectures will mix the material appropri-
ately)
I. Molecular Biology Tasks and Problems

1. Basic Elements of Biochemistry

• Prokaryotic and Eukaryotic Cells

• DNA, RNA, Proteins, and the Central Dogma of Biology

• Replication, Transcription, and Translation

• Chromosomes

2. Sequence Comparison

• Two Sequence Alignment: Optimal global and local alignments via dynamic programmic

• Multiple Sequence Alignment via Dynamic Programmic

• Database Search Engines: BLAST, FASTA

• Multiple Sequence Alignment using Hidden Markov Models

3. Gene Finding

• Finite Network Representation of Genes

• Modeling gene signals via Markov chains

• Finding Genes via Dynamic Programming

4. Finding cis-regulatory Motifs

• Statistical Models for Representing Transcription Factors Binding Sites

• Discovering motifs via Monte Carlo algorithms

• Discovering Modules of Transcription Factors Binding Sites

• Finding Motifs Using ChIp-chip data

5. Analysis of Microarray Data

• Pre-processing the Raw Data

• Clustering Genes and Experiments

• Applications to Classifying Cancer types

6. Protein Folding and Prediction of RNA secondary structure


• Ab Initio Folding: Design of energy, deterministic minimization procedures, annealing
algorithm, molecular dynamics

• Fold Recognition: Homology methods, threading

• Lattice Models for Protein Folding

• RNA secondary structure prediction using context-free-grammars

II. Mathematical Tools for Bioinformatics

1. Introduction to Probability
• Random Variables

• Conditional Probabilities and Expectations

• The Weak and Strong Laws of Large Numbers

• Markov Chains

• Poisson Processes

• Entropy and Related Concepts

• Hidden Markov Models

• Context Free Grammars

• Gibbs Distributions

2. Introduction to Statistical Inference

• Unbiased Estimation

• Cramer-Rao Inequality and Fisher Information Matrix

• Maximum Likelihood Estimation

• Hypothesis Testing and Likelihood Ratio Test

• Rank and other Nonparametric Statistics

• Introduction to Bayesian Statistics

3. Graphs and Suffix Trees

• Strings

• Graphs

• Construction of Suffix Trees

• Algorithms

4. Computational Algorithms
• Dynamic Programming

• The EM algorithm

• Monte Carlo Simulation Algorithms

• Stochastic Optimization

• Clustering Algorithms: Support Vector Machines

You might also like