Professional Documents
Culture Documents
Biotech and pharmaceutical companies spent $10 billion on hardware, software, and services in 2002.
Source: Gartner
The biotechnology/IT market will increase at a compound annual growth rate (CAGR) of 24% to nearly $38 billion by 2006. Source: IDC Research
Reference: Prof. A.S. Kolaskar Vice Chancellor, University of Pune
Dr J Boateng BIOT 1011 Bioinformatics
GENOMICS
Genetics: the science of genes, heredity, and the variation of organisms. In modern research, genetics provides tools in the investigation of the function of a particular gene, e.g. analysis of genetic interactions. Genomics: the study of large-scale genetic patterns across the genome for a given species. It deals with the systematic use of genome information to provide answers in biology, medicine, and industry.
Dr J Boateng BIOT 1011 Bioinformatics
The study of sequences, gene organization & mutations at the DNA level i.e. the study of information flow within a cell Genomics has the potential of offering new therapeutic methods for the treatment of some diseases, as well as new diagnostic methods. Major tools and methods related to genomics are bioinformatics, genetic analysis, measurement of gene expression, and determination of gene function.
Dr J Boateng BIOT 1011 Bioinformatics
GENOME COMPARISONS
Species
Humans Mouse Puffer fish Malaria Mosquito Fruit Fly Roundworm E. Coli
Chrom. Genes
46 40 44 6 8 12 1 28-35,000 22.5-30000 31000 14000 14000 19000 5000
Base pairs
3.1 billion 3.1 billion 2.7 million 365 million 137 million 97 million 4.1 million
Many diverse studies require the determination of the abundance of large numbers of specific DNA or RNA molecules in complex mixtures, including, for example, the determination of the changes in mRNA levels of many genes
Genome analysis entails the prediction of genes in uncharacterized genomic sequences. The 21st century has seen the announcement of the draft version of the human genome sequence. Model organisms have been sequenced in both the plant and animal kingdoms.
GENOMIC ANALYSIS
GENOMIC ANALSIS
However, the pace of genome annotation is not matching the pace of genome sequencing. Experimental genome annotation is slow and time consuming. The demand is to be able to develop computational tools for gene prediction.
Computational gene prediction is relatively simple for the prokaryotes where all the genes are converted into the corresponding mRNA and then into proteins. The process is more complex for eukaryotic cells where the coding DNA sequence is interrupted by random sequences called introns.
Dr J Boateng BIOT 1011 Bioinformatics
BIOLOGICAL QUESTIONS
Some of the questions biologists want to answer today are: What part of and DNA sequence codes for a protein and what part of it is junk DNA? Classify the junk DNA as intron, untranslated region, transposons, dead genes, regulatory elements. Divide a newly sequenced genome into the genes (coding) and the non-coding regions.
Dr J Boateng BIOT 1011 Bioinformatics
COMPLEXITY IS AN UNDERSTATEMENT?
GENOMICS ANALYSIS_Advances
Advanced methods are particularly amenable to organisms whose entire genome sequences are known, such as S. cerevisiae. It is now practicable to investigate changes of mRNA levels of all yeast open reading frames (ORFs) in one experiment.
Bioinformatic tools to organize and analyze such data Chip-based analysis of samples Models of gene networks
Dr J Boateng BIOT 1011 Bioinformatics
Microarray Technology
Post-genomic Era
Series of omics
Comparative genomics Structural and functional genomics Transriptomics Proteomics Metabolomics
Dr J Boateng BIOT 1011 Bioinformatics
Data Mining
Development of new tools for data mining
Sequence alignment Genome sequencing Genome comparison Micro array data analysis Proteomics data analysis Small molecular array analysis To derive information and gain knowledge from the data
Dr J Boateng BIOT 1011 Bioinformatics
COMPARATIVE GENOMICS
Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand the uniqueness between different species Comparative genomics involves the use of computer programs that can line up multiple genomes and look for regions of similarity among them.
Dr J Boateng BIOT 1011 Bioinformatics
Comparative
Entire Genome compared to other entire genomes. Use information from many genomes to learn more about the individual genes. BIOT 1011
Dr J Boateng Bioinformatics
BACKGROUND
Comparing the human genome with the genomes of different organisms helps to better understand gene structure and function and thereby develop new strategies in the battle against human disease. Comparative genomics also provides a powerful new tool for studying evolutionary changes among organisms.
This helps to identify the genes that are conserved among species along with the genes that give each organism its own unique characteristics. Using computer-based analysis to zero in on the genomic features that have been preserved in multiple organisms over millions of years, researchers will be able to pinpoint the signals that control gene function. This should in turn translate into innovative approaches for treating human disease and improving human health.
Dr J Boateng BIOT 1011 Bioinformatics
BACKGROUND
The evolutionary perspective may prove extremely helpful in understanding disease susceptibility. For example, chimpanzees do not suffer from some of the diseases that strike humans, such as malaria and AIDS. A comparison of the sequence of genes involved in disease susceptibility may reveal the reasons for this species barrier, thereby suggesting new pathways for prevention of human disease.
Dr J Boateng BIOT 1011 Bioinformatics
BACKGROUND
Although living creatures look and behave in many different ways, all of their genomes consist of DNA, the chemical chain that makes up the genes that code for thousands of different kinds of proteins. Precisely which protein is produced by a given gene is determined by the sequence in which four chemical building blocks - adenine (A), thymine (T), cytosine (C) and guanine (G) - are laid out along DNA's double-helix structure.
Dr J Boateng BIOT 1011 Bioinformatics
BACKGROUND
In order for researchers to most efficiently use an organism's genome in comparative studies, data about its DNA must be in large, contiguous segments, anchored to chromosomes and, ideally, fully sequenced. Furthermore, the data needs to be organized for easy access and high-speed analysis by sophisticated computer software. Organisms that have been completely sequenced include: mouse (Mus musculus), human (Homo sapiens), fruit fly (Drosophila melanogaster); and ....................
Dr J Boateng BIOT 1011 Bioinformatics
BACKGROUND
The fledgling field of comparative genomics has already yielded some dramatic results. For example, a March 2000 study comparing the fruit fly genome with the human genome discovered that about 60 percent of genes are conserved between fly and human. Simply put, the two organisms appear to share a core set of genes. Researchers have found that two-thirds of human cancer genes have counterparts in the fruit fly.
Dr J Boateng BIOT 1011 Bioinformatics
BACKGROUND
More surprisingly, when scientists inserted a human gene associated with early-onset Parkinson's disease into fruit flies, they displayed symptoms similar to those seen in humans with the disorder. This raises the possibility that the tiny insects could serve as a new model for testing therapies aimed at Parkinson's.
Dr J Boateng BIOT 1011 Bioinformatics
Mosquito Proteins that are shared by All genomes Exclusively by Human & P.f. Exclusively by Human & Mosquito Exclusively by P.f. & Mosquito Unique proteins in Human P.f. Targets for anti-malarial drugs Mosquito
Large scale separation : 2DE Liquid Chromatography Identification : MALDI MS Tandem MS/MS FT-MS ..
Dr J Boateng BIOT 1011 Bioinformatics
http:www.bio-itworld.com/archive/031704/horizons_horizons_comm.html
1730
Proteomics
886,000 hits (2004) 4,700,000 hits (2005) 2,070,000 hits (2004) 16,000,000 hits (2005)
Genomics
Genomics analysis
mRNA
proteome
Coding DNA
Proteome analysis
Proteins Peptides Glyco, other modifications Dynamic Up/ down variants Poorly archived
linear
Dynamic Up/down
3D
No notion of completion
Proteomics Genomics
More differences
Protein dynamic
Fragile molecules Handling dependent Labile modification Protein-interaction Localization dependent MS related (not yet) Protein Chip (not yet) Antibodies array (not yet)
HTP
Proteomics:
Original definition: study of the proteins encoded by the genome of a biological sample Current definition: study of the whole protein complement of a biological sample (cell, tissue, animal, biological fluid [urine, serum]) Usually involves high resolution separation of polypeptides at front-end, followed by mass spectrometry identification and analysis
Dr J Boateng BIOT 1011 Bioinformatics
Proteomic Technologies
Amino Acid Composition Array-based Proteomics 2D PAGE Mass Spectrometry Structural Proteomics Informatics (and the challenges facing the Human Proteome Dr J Boateng BIOT 1011 Project)
Bioinformatics
Protein Sequencing
step 1, fragmenting into peptides
Protein Sequencing
step 2, sequencing the peptides by Edmund degradation.
Dr J Boateng BIOT 1011 Separation by HPLC and detect by absorbance at 269nm. Bioinformatics
Array-based Proteomics
Employ two-hybrid assays Use GFP, FRET, and GST
GFP = green florescent protein FRET = florescence resonance energy transfer GST = glutathione S-transferase, a well characterized protein used as a marker protein.
Dr J Boateng BIOT 1011 Bioinformatics
Array-based Proteomics
Array-based Proteomics
Offer a high-throughput technique for proteome analysis. These small plates are able to hold many different samples at a time. Current research is ongoing in an attempt to interface array methodologies with Mass Spectrometry at ORNL.
Dr J Boateng BIOT 1011 Bioinformatics
2D PAGE
2-D gel electrophoresis is a multi-step procedure that can be used to separate hundreds to thousands of proteins with extremely high resolution. It works by separation of proteins by their pI's in one dimension using an immobilized pH gradient (first dimension: isoelectric focusing) and then by their MW's in the second dimension. The core technology of proteomics is 2-DE At present, there is no other technique that is capable of simultaneously resolving thousands of proteins in one separation procedure. (sited in 2000)
Dr J Boateng BIOT 1011 Bioinformatics
In the past
Dr J Boateng BIOT 1011 Bioinformatics
1. 2. 3. 4. 5.
Takes longer time to run. Techniques are cumbersome. (the soft, thin, long gel rods needs excellent experiment technique) Batch to batch variation of carrier ampholytes. Patterns are not reproducible enough. Lost of most basic proteins and some acidic protein.
2D PAGE
2-D gel electrophoresis process consists of these steps: Sample preparation
First dimension: isoelectric focusing Second dimension: gel electrophoresis
A good-looking spot pattern streak and smear free is not a guarantee for best 2-DE protocol
Mass Spectrometry
Mass Spectrometry is another tool to analyze the proteome. In general a Mass Spectrometer consists of:
Ion Source Mass Analyzer Detector
Mass Spectrometers are used to quantify the mass-to-charge (m/z) ratios of substances. From this quantification, a mass is determined, proteins are identified, and further analysis is performed.
Dr J Boateng BIOT 1011 Bioinformatics
MASS SPECTROMETRY
MORE DETAILED MASS SPECTROMETRY APPLICATIONS IN MORNING LECTURE ON 28TH NOVEMBER 2011
What is Bioinformatics?
Conceptualizing biology in terms of molecules and then applying informatics techniques from math, computer science, and statistics to understand and organize the information associated with these molecules on a large scale
Sequence retrieval: National Center for Biotechnology Information GenBank and other genome databases Sequence comparison programs: BLAST GCG MacVector
% identity
CATTATGATA GTTTATGATT
70%
MRCKTETGAR
90%
MRCGTETGAR
Dr J Boateng BIOT 1011 Bioinformatics
Weaknesses: Sometimes not up-to-date Limited possibilities Limited comparisons and information Not accurate
Dr J Boateng BIOT 1011 Bioinformatics
Human Genome Project Gene array technology Comparative genomics Functional genomics
Data Mining
Handling enormous amounts of data Sort through what is important and what is not Manipulate and analyze data to find patterns and variations that correlate with biological function
Dr J Boateng BIOT 1011 Bioinformatics
Proteomics
Uses information determined by biochemical/crystal structure methods Visualization of protein structure Make protein-protein comparisons Used to determine: - conformation/folding - antibody binding sites - protein-protein interactions - computer aided drug design
Dr J Boateng BIOT 1011 Bioinformatics
students
educators
bioinformatics
researchers
institutions