Professional Documents
Culture Documents
2007
Course Description
With the availability of genomic, expression (microarray), ChIp-chip, structural, mass spectrom-
etry, and other biological data, modern molecular biology depends increasingly on detailed math-
ematical modeling, and mathematical/computer science/ statistical tools and algorithms for the
analysis and interpretation of the data. Basic objectives of molecular biology include: (a) Ge-
nomics: sequencing and comparing the genomes of different species, finding the genes,discovering
transcription factors motifs and other patterns in a genome, and constructing phylogenetic trees,
(b)Functional Genomics: microarray data analysis and clustering of genes and experiments, reg-
ulation of gene transcription and other biological processes, determining gene networks on the
basis of microarray data, ChIp-chip data, and cis-regulatory modules information, (c)Proteomics:
determining protein identification and signal transduction pathways on the basis of tandem mass
spectrometry and immunoprecipitation information, (c) Structural Proteomics: determining the
structure and function of proteins, understanding protein complexes and DNA-protein com-
plexes, ligand-receptor coupling, docking, and drug design, (d) RNA structure: predicting RNA
secondary structure.
The study of these problems has involved mathematical tools, models, and algorithms from
Bayesian Statistics, Learning Theory and Data Mining, Decision Theory, Computer Science,
and even Algebraic Geometry. The methods include: Hidden Markov Models, Probabilistic
Graphical Models, Context-Free and other grammars, dynamic programic, Molecular Dynamics
and Modeling, Bayesian likelihood, Monte Carlo optimization and simulation algorithms, classi-
fication and regression trees, support vector machines, clustering algorithms, the EM algorithm,
and graph theory.
The main goal of the two semester course will be to provide a self-contained exposition of
the main mathematical/probabilistic techniques in computational molecular biology and bioin-
formatics, together with current of the art applications of these techniques. The main topics
for the course are listed below (SYLLABUS) under two heading: I. Molecular Biology Tasks
and Problems and II. Mathematical/Statistical/Computational Tools. The mathematical tools
and their biological applications will not be separated, but they will be mixed as we go along.
The course will begin for example with a presentation of the basic concepts and problems in
biochemistry and biology, and an outline of the underpinning mathematical tools.
None of these books is adequate for the course. Their material will be supplemented by lecture
notes I have written at Brown.
SYLLABUS
(The material below is for both semesters. They are separated into the biological issues and the
mathematical/computational tools for convenience. The lectures will mix the material appropri-
ately)
I. Molecular Biology Tasks and Problems
• Chromosomes
2. Sequence Comparison
• Two Sequence Alignment: Optimal global and local alignments via dynamic programmic
3. Gene Finding
1. Introduction to Probability
• Random Variables
• Markov Chains
• Poisson Processes
• Gibbs Distributions
• Unbiased Estimation
• Strings
• Graphs
• Algorithms
4. Computational Algorithms
• Dynamic Programming
• The EM algorithm
• Stochastic Optimization