You are on page 1of 36

CS882: Advanced Topics in

Bioinformatics
History and Frontier of Bioinformatics
Experience is a dear teacher, but fools
will learn at no other.
-- Benjamin Franklin
Why Study History?
This Course
I will try to introduce bioinformatics problems in the context of
history.
Developments in biology that lead to bioinformatics
Sequence Comparison
Genome sequencing
Proteomics
Bioinformatics is too broad an area to be fully surveyed in a course.
This course is a sample of the works
From the course, I hope you can learn
Many bioinformatics research problems.
How bioinformatics area evolved.
How to choose research problems.
How to formulate interesting, useful, and solvable problems.
Evaluation
Students choose between a seminar or a course project
Seminars
Read several related research articles,
Write a survey, and comment on the significance and impact of the
papers.
Why (bother)? Who (did it)? What (was achieved)? When? How? So
what (the impacts)?
Predict the future developments.
Projects
A few small course projects available for selection.
Some coding involved.
Write a report.
Both need to do a presentation.
Evaluation:
participation (20%)
verbal presentation (40%)
written report (40%)

Ways to Find Survey Topics
A research area (or a sub-area)
Sequence comparison
Genome sequencing
Proteomics
Phylogeny
Gene expression
Protein-protein interaction network
Haplotype
Protein structures
A research problem
Find a research paper (RECOMB 2012, 2011)
Read its references.
Read citations of the important references (Google scholar useful)
Survey the history and development of that research problems.
A type of public databases
DNA, protein, mutations, SNP, Glycan, protein structures
Research Projects
Select one of the following
Study the genome and proteome similarity
between organisms.
Study the redundancies in the NCBI nr (non-
redundant) database.
Succinct representation of redundant data
(compression isnt the only goal. Better access
important).
EST
Protein
NGS
A QUICK REVIEW OF BACKGROUND
Important Biology Advancements that Leads to Bioinformatics
Central Dogma
Ancient Time
Our ancestors know
something about
genetics.
Inheritance: Things
produce children just
like themselves.
Selective breeding.
Image credit: The Cartoon Guide to Genetics by
Gonick and Wheelis
Gregor Mendel (1822 1884)
A scientist and an Augustinian monk
from Austria.
Mendel studied Pea plants.
Genotypes Control Phenotypes
Homozygous
Heterozygous
In this case, there are two genotypes of the same gene, A and a,
for the height of the pea plants.
Each individual has two copies of the gene, from two parents.
A dominate a.
Hybridization
Homozygous
Heterozygous
Descendants of Heterozygous Pea
A Comment
Experiment Data Knowledge
Early days the data is small enough to be
processed by a human.
But today it requires computer that is
bioinformatics.
Rediscovery of Mendel's Work
Mendels work was ignored by the world ,
until it was rediscovered around 1900, by
Hugo de Vries and Carl Correns.
16 years after Mendel died.

Darwin (1809 1882)
I have called this principle, by which each
slight variation, if useful, is preserved, by the
term Natural Selection.
Evolution
Phylogeny Trees
In the past people dig
the fossil to study the
evolution.
Now use characteristics
(e.g. DNA sequence) of
todays species to
computationally
construct the evolution
history.
Chromosomes
Walter Sutton (left) and Theodor Boveri (right)
independently developed the chromosome
theory of inheritance in 1902.
Human has 23 pairs of Chromosomes.
Genetic Map
If two genes are on the same
chromosome, they tend to
inherit together.
(AB, ab) x (AB, ab) will not give Ab, if theres
no cross-over.
Chromosomes Crossover
Dear bioinformatician:
With this model, can you suggest a way to do
genetic map (or linkage map)?
DNA
Base Pairs
4 different nucleotides in DNA.
A, C, G, T
A single strand is a sequence of A,C,G,T.
The other complementary strand can be
computed: A-T, and C-G. These are base pairs.

DNA Replication
Before the Discovery of DNA
In 1869, DNA was first isolated by the Swiss physician Friedrich
Miescher. He called it "nuclein because its in nuclei of the cell.
In 1878, Albrecht Kossel isolated the non-protein component of
"nuclein", nucleic acid, and later isolated its five
primary nucleobases.
In 1927 Nikolai Koltsov proposed that inherited traits would be
inherited via a "giant hereditary molecule" which would be
made up of "two mirror strands that would replicate in a semi-
conservative fashion using each strand as a template".
First Confirmation of DNAs Role
1928, Griffith concluded that
the type II-R had been
"transformed" into the lethal
III-S strain by a "transforming
principle" that was somehow
part of the dead III-S strain
bacteria
1944, Oswald Avery, Colin
MacLeod, and Maclyn McCarty,
confirmed that DNA was the
transforming principle.
Griffiths Experiment
Discovery of DNA Structure
1952, Rosalind Franklin and Raymond
Gosling used X-ray crystallography to
help visualize the structure of DNA.
1953, James D. Watson and Francis
Crick suggested the first correct
double-helix model of DNA structure.
In 1962, after Franklin's death, Watson,
Crick, and Wilkins jointly received the
Nobel Prize in Physiology or Medicine.
Rosalind Franklin
DNA sequencing
Sanger Sequencing was developed in 1977
and soon became the method of choice.


ATACTCAC.
DNA to be sequenced
Grew the other strand using target DNA as a
templete
Monomers: A, C, G, T, and a modified A.
If a growth uses the modified A, then the
growth stops.
Sanger Sequencing
Do the experiment for all four bases, and separate different
lengths with gel electrophoresis.
Popularity of DNA Sequencing
Private sector played important role
Applied Biosystem made the first automated
DNA sequencer and a lot of money.
ABI 3130
Applied Biosystems
May 1981, the company was founded by two scientist/engineer from Hewlett Packard, Sam
Eletr and Andre Marion
.

1982, first commercial instrument, the Model 470A Protein Sequencer. 40 employees, $400K
revenue.
1983, employees = 80, IPO, revenues= US$5.9 million. Model 380A DNA Synthesizer. Licensed
automated sequencing technology using fluorescent dyes from CIT.
1984, revenue US$18 million, 200 employees.
1985, revenue US$35 million.
1986, revenue US$52 million. The release of the Model 370A DNA Sequencing System, using
fluorescent tags, revolutionized gene discovery.
1987, revenues US$85 million, 788 employees.
1988, revenue US$132 million, 1000 employees. In that year for the first time, genetic
science reached the milestone of being able to identify individuals by their DNA.
1989, revenue reached nearly $160 million.
1990, the U.S. government approved financing to support the Human Genome Project.
1993, acquired by Perkin Elmer.
Human Genome Project
The Human Genome Project (HGP) is an
international project with a primary goal of
sequencing and annotate human genome.
October 1990, launch.
2003, finished sequencing and initial analysis.
Funded by public funding, >3 Billion dollars spent.
Celera Corporation
Founded 1998 by PE Corporation and Dr. J. Craig Venter.
Craig Venter sequenced Yeast genome at TIGR (The Institute
of Genetic Research)
Competing with the public effort on finishing human genome.
2003, finished human at almost the same time (Venter
announced the victory).
Data:
Public: 13 years, 3 billion$
Celera: 5 years, 300 million$
Celera has access to prior knowledge.
Competing in Bioinformatics
Gene Myers vs. Jim Kent
In a short time it will be hard to realize how
we managed without the sequence data.
Biology will never be the same again.
-- N. Williams. Closing in on the complete yeast genome
sequence. Science, 268:1560-61. 1995.
The rise of Bioinformatics!
Genome Sequenced, So What?
It turns out that genome sequencing isnt the end of the story.
People called it the post-genome era after 2003.
It was a landmark but didnt solve all problems.
First of all, your genome and my genome are different.
1000 genome project.
Second, genes are expressed differently at different
time/cells/conditions.
Gene expression (microarray)
A flash in the pan.
Thirdly, proteins are not only expressed differently, but also
modified.
Proteomics (mass spectrometry).
(HUPO) Human Proteomics Organization.
Proteomics started to produce some biomarkers
Metabolomics, Glycomics
Life is very complex.
Wrap Up
Pre-bioinformatics developments in biology
Emerging of bioinformatics
Bioinformatics deals with data
Initial impacts of bioinformatics
Many more years of challenging problems for
bioinformatics.
Triggered by new measurements technology
Accelerate the developments of new
measurements.

You might also like