You are on page 1of 7

Volume 2 No.

7, JULY 2011

ISSN 2079-8407

Journal of Emerging Trends in Computing and Information Sciences


2010-11 CIS Journal. All rights reserved. http://www.cisjournal.org

Pune University Metabolic Pathway Engineering (PuMPE) Resource


1
1 2

A.S.Kolaskar, 2Shweta Kolhi

KIIT University, Bhubaneswar - 751024, India. Bioinformatics Center, University of Pune, Pune 411007, India. 1 akolaskar@yahoo.com, 2shwetakolhi@gmail.com

ABSTRACT
PuMPE is a comprehensive resource that provides integrated information on metabolome of bacterial systems. The genome data is annotated to infer metabolic pathways using in-house tools and web-based sources. PuMPE introduces a novel aspect of metabolic categorization. It is the first resource to provide metabolome-based tree computed by comparing metabolome between bacteria. PuMPE has metabolic pathways information for 581 bacteria having completely sequenced genome. Information on Km (Michaelis constant) values, catalytic site data and 3D structures of enzymes is integrated and made available on one platform. Open source relational database management system MySQL is used at the backend and software used for visualization of structures and pathway interactions are also from open source. Updation is done regularly with minimal human intervention. This resource is user friendly and provides unique integrated information to carry out metabolic pathway engineering. It is available at http://115.111.37.202/mpe/
Keywords Metabolic pathways database, pathways interactions, metabolome-based tree, metabolic categorization

1. INTRODUCTION
Advancements in instrumentation over the last two decades have lead to exponential increase in biological data. This biological data is in the form of sequence data for genome and proteome, microarray data for gene expression profiles, metabolome data for metabolic pathways information etc. Many public domain databases catalogue this information in a systematic manner. These databases can be general or specific in nature. NCBI [1], EBI [2] , Ensembl [3] etc are examples of databases available in public domain, having general molecular biology information. Stanford Microarray Database [4] , Catalytic site atlas [5] , miRBase: the microRNA database [6] are few examples of database having specific biological data . These static databases have enormous information on genes and proteins. On the other hand, continuous dynamic interaction with the environment is an important property of any living system and hence there is a need for a comprehensive resource that provides information on dynamic interactions between genes, proteins and ligands. Metabolic pathways database is an example of such dynamic interactions. Metabolism is one of the better-documented biological processes that represents interacting network of genes. There exist metabolic pathways databases like BioCyc [7], KEGG PATHWAY [8] etc., which provide extensive information on organism specific pathway data. However they do not include data on interaction of pathways among themselves in the metabolome and relationship between organisms depending on their metabolome. Further enzyme kinetics data important for metabolic pathways engineering is also absent from above-mentioned databases. Inclusion of such information reflects behaviour of an organism. To study biology of an organism at molecular level in a holistic manner it is necessary to catalogue systematically this data in a user-

friendly mode that can be then used to extract knowledge. There is a need to develop software tools to analyse the data in the database and extract knowledge relevant to the user. These databases and software tools become important resources and are helpful to build user specific programs to engineer metabolic pathways and provide help in designing new biological species or cells. Pune University Metabolic Pathways Engineering resource is one such attempt.

2. PuMPE DESCRIPTION
Pune University Metabolic Pathways Engineering (PuMPE) resource has primary as well as derived data that will be useful to carry out metabolic pathways engineering. The database includes metabolic pathways information for bacterial systems whose genome is fully sequenced. The PuMPE resource contains all the information that is available in KEGG PATHWAY for fully sequenced bacteria by following the BioCyc ontology. In addition to data from KEGG PATHWAY several new primary and derived data are added to increase the utility of the resource. Some of the unique features of PuMPE include metabolic pathways categorization, metabolome based tree, visualization of interaction of each pathway with metabolome and Km (Michaelis constant) values [10]. PuMPE also provides information on catalytic site [5] , 3D structures of enzymes [9], choke point enzymes, dynamic links to literature database (PubMed) etc. Data is organized in a relational database MySQL at the backend and has a user-friendly front-end. Currently PuMPE has metabolic pathways information on 581 bacteria having completely sequenced genome. It contains information on 1750 pathways and 10201 reactions.

325

Volume 2 No.7, JULY 2011

ISSN 2079-8407

Journal of Emerging Trends in Computing and Information Sciences


2010-11 CIS Journal. All rights reserved. http://www.cisjournal.org

2.1 Structure and implementation of PuMPE


PuMPE consists of module for data acquisition and curation. The data is organized in relational database management system MySQL and schema is given in Figure 1. PuMPE is composed of 11 linked tables and contains information given in Table 1

fetched from various sources and genome annotation is undertaken using tools developed in-house to identify pathways and to analyse them to extract knowledge.

Figure 2: PuMPE workflow


Figure 1: Schema of PuMPE

In addition to these primary data elements, derived data elements such as categorization of metabolic pathways, metabolome based bacterial relationship tree etc are also incorporated (see Figure 1 for schema of database part of PuMPE resource)

2.2 Data acquisition


The bacterial genome sequences are obtained from the repository of nucleic acid sequences available at the NCBI server [1]. Information on metabolic pathways ontology and pathway enzymes was obtained from BioCyc [7]. Latest PDB is used to get 3D structure of the enzymes [9]. Data of reaction kinetics and enzyme catalytic site data is obtained from BRENDA [10] and Catalytic Site Atlas (CSA) [5] respectively. Drug target data specific to bacterial systems was retrieved from Drug Bank [11]. Homology models were built in-house using Insight II with distance dependent dielectric constant. But no explicit water molecules.

2.3 Data annotation and curation


Table 1: Names and contents of tables in PuMPE The query system has been developed using ASP. A user-friendly web interface is designed in HTML by implementing Java scripts. Parsing, annotation and data updates have been automated to minimize human intervention. Workflow of data collection and analysis in PuMPE is given in Figure 2. As can be seen, data is The usefulness and quality of any data resource depends on the accuracy and up to datedness of data in the database. In PuMPE special care is taken to improve annotations and curation of the data. Enrichment of pathway annotations for each bacterium is carried out in PuMPE using following approach An enzyme in a pathway is considered to be present if the query sequence has Bit score 100 and E-

326

Volume 2 No.7, JULY 2011

ISSN 2079-8407

Journal of Emerging Trends in Computing and Information Sciences


2010-11 CIS Journal. All rights reserved. http://www.cisjournal.org

value 0.05 with an annotated enzyme belonging to closely related species and known to be present in the same pathway. Further analysis is done to check if such a sequence has a catalytic site identical to the reference enzyme. If both the results are positive then shotgun methodology [12] was used to confirm the presence of an enzyme in the pathway. If all the enzymes in the pathway were found to be present, then only the pathway is marked as present in the bacterium in question. The above approach helped to identify additional pathways that are included in PuMPE, marked with *.

2.4 Data Visualization


Data visualization is done at three different levels: i) 3D structures of enzymes whose experimental 3D structural information is available in PDB or whose 3D structure is predicted using Insight II are visualized using Jmol, a public domain software for windows. In house visualizer is developed to visualize 2D and 3D structures of metabolites. This visualization tool is written in Java. JavaScript Information Visualization Toolkit was used to visualize interaction of individual pathway with remaining pathways through common compounds.

Figure 3: Retrieval of metabolic pathways from bacteria

3.

UTILITIES AT PuMPE

ii)

iii)

The usefulness of a database increases if analysis utilities are also developed. Users should be able to extract knowledge using these tools. It is with this aim following analysis tools were developed and incorporated in the resource. (a) Comparison of metabolic pathway between two bacteria can be performed and the presence / absence of a pathway against other bacteria can be identified (Figure 4). This tool is written using ASP by implementing Java Scripts. The tool uses the unique id of each pathway to compare and report presence / absence of a pathway.

2.5 Search and retrieval of data from database


User can search enzymes, compounds and pathways. Enzymes can be searched by providing EC number (Enzyme commission four digit number) Enzyme name CAS number (Chemical Abstracts Service number )

Compounds can be searched by their Name Formula CAS number

Entire list and total number of pathways present in any bacteria can be obtained by selecting the bacteria of interest from a drop-down box (Figure 3). If a particular pathway is present in a bacteria then a logical navigation is provided beginning with pathway information followed by enzyme information which includes 3D structures, PROSITE pattern [13], dynamically generated PubMed links, Km values, amino acid residues in catalytic site, homology models wherever available etc., and finally the nucleotide and protein sequence of the enzymes in the pathway selected by the user.

Figure 4: Comparison between metabolic pathways from two organisms.

327

Volume 2 No.7, JULY 2011

ISSN 2079-8407

Journal of Emerging Trends in Computing and Information Sciences


2010-11 CIS Journal. All rights reserved. http://www.cisjournal.org

know the effects of enzyme drug target on other metabolic pathways. (b) Metabolic categorization The organization of metabolome in different categories is initiated by identifying the core pathways. Core pathways are identified by comparing unique pathway ids among 94 bacteria having 250 annotated metabolic pathways. The pathway ids present in all 94 bacteria are included in core pathways. 42 core pathways were identified which are common in each of the 94 bacteria considered for this analysis [14]. These form Stage I of metabolic categorization the start point of metabolic categorization. The rest of the pathways in every bacterium are then categorized depending upon direct or indirect interaction of each of the remaining pathways with the Core/Stage I pathways. Interaction between two pathways is defined by the presence of at least one common compound. Thus the pathways categorization utility compares compound ids from each of the Stage I pathway ids with the compound ids from each of the remaining pathway ids. Pathway ids having common compound ids with the Stage I pathways are then categorized as Stage II pathways. Following the same logic of identifying common compound ids between newly categorized pathways and remaining pathways, this tool categorizes the metabolome iteratively. Categorization process is stopped when no common compounds exists between newly categorized pathways and remaining pathways. The interaction of pathways present in different categories is documented in PuMPE and can browsed (Figure. 5) and visualized (Figure. 6). As depicted in Figure. 6, each pathway is represented as a node. Interacting pathways between two categories are connected through an edge displaying the common compound. For visualization, in the parlance of graph theory, each pathway is represented as a node and an edge (representing a common compound) connects interacting pathway nodes. The visualization is modular in nature avoiding the complex interconnectivity of large-scale metabolic networks. Pathway interactions are depicted in systematic order with query pathway (pathways for which the user intends to obtain interacting pathway) as the root and the interacting pathways as internal node/leaf node. The internal node being connected to its interacting pathway and so on, until a leaf node is obtained that has no further interacting pathways. The simplicity and significance of this depiction can be readily comprehended. One can easily understand the impact of disrupting a particular pathway on global network. This will be useful to

Figure 5: Pathway categories

Figure 6: Interactions between different categories of pathways through common compound. (c) Bacterial family-wise distribution metabolic pathways

Bacterial family-wise distribution of each metabolic pathway can be studied by selecting the bacterial family and metabolic pathway of interest from the drop-down box (Figure.7). This tool directly shows if the selected pathway is identical/similar or absent across all bacteria belonging to the selected bacterial family. To report a pathway as identical, this tool checks if start compound id, intermediate compound ids and end compound id are same between the reference pathway and the pathway present in the bacterium. Where as, a pathway is reported as similar if start and end compound ids are same but intermediate compound ids are different between the reference pathway and the pathway present in the bacterium. Further, if the pathway is similar then the alternate pathway

328

Volume 2 No.7, JULY 2011

ISSN 2079-8407

Journal of Emerging Trends in Computing and Information Sciences


2010-11 CIS Journal. All rights reserved. http://www.cisjournal.org

reaction id and the corresponding enzymes are provided as hyperlinks through this utility.

Further the resource provides a list of choke point enzymes for each bacterium and these enzymes are mapped on metabolic pathways. Choke points are critical points in metabolic networks. Inactivation of choke points may lead to an organism's failure to produce or consume particular metabolites that could cause serious problems for fitness or survival of the organism [15]. Using choke point enzymes information, potential drug targets can be identified [16, 17].

4. DISCUSSIONS
In this resource BioCyc ontology is used which has several advantages as compared to ontology used in KEGG PATHWAY [18] as it considers smallest pathway as the unit and provides unique id to such pathway. This helps in comparison of pathways. PuMPE has many additional derived data fields those add value to the database and are essential to make PuMPE a useful resource for pathways engineering. The novel aspect of this resource is Metabolic categorization as well as the Metabolome-based tree. The metabolic categorization is governed by the interactions of a set of identical pathways (Stage I pathways - present in all completely sequenced wellannotated bacteria) with the remaining pathways in a bacterium. This has huge implication in drug discovery where in, complications resulting from adverse drug reactions are observed as a result of lack of complete information about the global interaction of metabolic pathways [19]. This is generally observed when a drug target has a role to play in more than one pathway [19]. Metabolic categorization can be used to identify targets participating in unique pathway with least global interaction. Non-interacting pathways from each Stage can be potential drug targets with minimal side effects, as they do not interfere with functioning of rest of the pathways. Further choke point enzymes are identified and reported in the resource. These choke point enzymes are mapped onto metabolic pathways. The knowledge on metabolic Stage and choke point enzymes will help to make drug discovery process more efficient and reliable. It has been shown that efficient metabolic engineering can be undertaken by knocking out competing pathways to improve the yield of target metabolites [20,21,22]. Knowledge of global interaction of each pathway in PuMPE can be used to block the competing pathways and thus maximize the yield of required metabolite. Pathway alignment of single/multiple pathways across organisms in order to infer a metabolome-based tree is known to provide valuable information on metabolic capabilities of different organisms. Though multiple efforts have been made to infer metabolomebased tree, there is not a single web-resource that provides this information readily. This void is filled by the inclusion of metabolome-based tree for each bacterialfamily in PuMPE. Further, distribution of each metabolic pathway across bacteria belonging to distinct bacterial-

Figure 7: Family-wise distribution of pathway

d) Metabolic pathway profile based metabolome tree Metabolic pathway profile based metabolome tree is computed to understand the relatedness of metabolomes among bacterial species belonging to same family. Such relations among metabolomes of the bacteria may be similar or different when compared with the relationship that one obtains by comparing full genome or several proteins. The order of biochemical reactions in a pathway is evolved differently and depends on the requirement of products as well as on the delicate balance of intermediates. Thus pathways evolution is a multidimensional process where biochemical reactions, rate of reactions and the order of reactions are optimised. The metabolic pathways profiling provides insights in this aspect of biology of bacterial species in the family (Figure 8).

Figure 8: Metabolome based tree

329

Volume 2 No.7, JULY 2011

ISSN 2079-8407

Journal of Emerging Trends in Computing and Information Sciences


2010-11 CIS Journal. All rights reserved. http://www.cisjournal.org

family can be browsed in PuMPE to interpret pathways as identical, similar or absent across the family. Taken all in consideration, PuMPE has useful information pertaining to systems biology. It offers a reliable platform to study Biology in holistic manner. PuMPE has been developed at the Bioinformatics Centre, University of Pune. A monthly updation of PuMPE is planned. It can be accessed through http://115.111.37.202/mpe/

Bronwen L. Aken, Ewan Birney, Fiona Cunningham, Ian Dunham, Richard Durbin, Xos M. FernndezSurez, Javier Herrero, Tim J. P. Hubbard, Anne Parker, Glenn Proctor, Jan Vogel and Stephen M. J. Searle Ensembl 2011 Nucleic Acids Research 39 Database issue:D800-D806. 2011 [4] Hubble J, Demeter J, Jin H, Mao M, Nitzberg M, Reddy TB, Wymore F, Zachariah ZK, Sherlock G, Ball CA. Implementation of GenePattern within the Stanford Microarray Database. Nucleic Acids Res;37(Database Issue):D898-901. 2009 Jan 1 [5] Craig T. Porter, Gail J. Bartlett, and Janet M. Thornton .The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucl. Acids. Res. 32: D129D133. 2004 [6] Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for microRNA genomics. Nucleic Acids Res 36(Database Issue):D154-D158. 2008 [7] Caspi R, Foerster H, Fulcher CA, Kaipa P, Krummenacker M, Latendresse M, Paley S, Rhee SY, Shearer AG, Tissier C, Walk TC, Zhang P, Karp PD.The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases.Nucleic Acids Res. Jan;36(Database issue):D623-31. Epub 2007 Oct 27. 2008 [8] Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M., and Hirakawa, M.; KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 38, D355-D360. 2010. [9] Berman H.M, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne The Protein Data Bank Nucleic Acids Research, 28: 235-242. 2000 [10] Schomburg I, Chang A, Ebeling C, Gremse M, Heldt C, Huhn G, Schomburg D. "BRENDA, the enzyme database: updates and major new developments". Nucleic Acids Res 32 (Database issue): D431433. 2004 [11] Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J DrugBank: a comprehensive resource for in silico drug discovery and exploration..Nucleic Acids Res. Jan 1;34(Database issue):D668-72. 2006 [12] Pegg S.C, Babbitt P.C, Shotgun: getting more from sequence similarity searches. Bioinformatics 15, 729740 1999. [13] Hulo N., Bairoch A., Bulliard V., Cerutti L., De Castro E., Langendijk-Genevaux P.S., Pagni M.,

ACKNOWLEDGEMENT
One of the authors, Shweta Kolhi acknowledges financial assistance from Department of Biotechnology Center of Excellence Scheme, Government of India. The authors would also like to acknowledge Dr. Sangeeta Sawant, Mr. Om Prakash Pandey and Miss. Deshpande for their help.

REFERENCES
[1] Sayers E, Tanya Barrett, Dennis A. Benson, Evan Bolton, Stephen H. Bryant, Kathi Canese, Vyacheslav Chetvernin, Deanna M. Church, Michael DiCuccio, Scott Federhen, Michael Feolo, Ian M. Fingerman, Lewis Y. Geer, Wolfgang Helmberg, Yuri Kapustin, David Landsman, David J. Lipman, Zhiyong Lu, Thomas L. Madden, Tom Madej, Donna R. Maglott, Aron Marchler-Bauer, Vadim Miller, Ilene Mizrachi, James Ostell, Anna Panchenko, Lon Phan, Kim D. Pruitt, Gregory D. Schuler, Edwin Sequeira, Stephen T. Sherry, Martin Shumway, Karl Sirotkin, Douglas Slotta, Alexandre Souvorov, Grigory Starchenko, Tatiana A. Tatusova, Lukas Wagner, Yanli Wang, W. John Wilbur, Eugene Yaschenko, and Jian Ye Database resources of the National Center for Biotechnology Information Nucleic Acids Res. D38D51. 2011 January; 39(Database issue): Published online 2010 November 20 [2] Catherine Brooksbank, Graham Cameron, and Janet Thornton. The European Bioinformatics Institutes data resources Nucleic Acids Res. 2010 January; 38(Database issue): D17D25. Published online 2010 January. [3] Flicek P, M. Ridwan Amode, Daniel Barrell, Kathryn Beal, Simon Brent, Yuan Chen, Peter Clapham, Guy Coates, Susan Fairley, Stephen Fitzgerald, Leo Gordon, Maurice Hendrix, Thibaut Hourlier, Nathan Johnson, Andreas Khri, Damian Keefe, Stephen Keenan, Rhoda Kinsella, Felix Kokocinski, Eugene Kulesha, Pontus Larsson, Ian Longden, William McLaren, Bert Overduin, Bethan Pritchard, Harpreet Singh Riat, Daniel Rios, Graham R. S. Ritchie, Magali Ruffier, Michael Schuster, Daniel Sobral, Giulietta Spudich, Y. Amy Tang, Stephen Trevanion, Jana Vandrovcova, Albert J. Vilella, Simon White, Steven P. Wilder, Amonida Zadissa, Jorge Zamora,

330

Volume 2 No.7, JULY 2011

ISSN 2079-8407

Journal of Emerging Trends in Computing and Information Sciences


2010-11 CIS Journal. All rights reserved. http://www.cisjournal.org

Sigrist C.J.A. The PROSITE database. Nucleic Acids Res. 34:D227-D230. 2006. [14] Kolaskar A.S., Kolhi Shweta., Categorization of Metabolome in Bacterial Systems. Unpublished. Manuscript under preparation. [15] Yeh I, Hanekamp T, Tsoka S, Karp PD, Altman RB. Computational analysis of Plasmodium falciparum metabolism: organizing genomic information to facilitate drug discovery. Genome Res, . 14, 917924. 2004 [16] Deepak Perumal, Chu Sing Lim, Kishore R. Sakharkar and Meena K. Sakharkar Load Points and Choke Points as Nodes for Prioritizing Drug Targets in Pseudomonas aeruginosa. Current Bioinformatics, , 4, 48-53. 2009 [17] Dong-Yup Lee,Bevan Kai Sheng Chung, Faraaz N.K. Yusufi,and Suresh Selvarasu In Silico Genome-Scale Modeling and Analysis for Identifying AntiTubercular Drug Targets. Drug Development Research 72 : 121-129 2011

[18] Green M.L, Karp P.D, The outcomes of pathway database computations depend on pathway ontology. Nucleic Acids Res. 34, 3687-97. 2006. [19] Watterson S, Marshall S, Ghazal P.Logic models of pathway biology.Drug Discov Today. May;13(910):447-56. Epub 2008 Apr 23. Review. 2008 [20] Jarboe LR, Grabar TB, Yomano LP, Shanmugan KT, Ingram LO. Development of ethanologenic bacteria, Adv. Biochem. Eng. Biotechnol. 108 , pp. 237261. 2007 [21] Leonard E, Yan Y, Fowler ZL, Li Z, Lim CG, Lim KH, Koffas MA. Strain improvement of recombinant Escherichia coli for efficient production of plant flavonoids, Mol. Pharm. 5 , pp. 257265. 2008 [22] Causey TB, Shanmugam KT, Yomano LP, Ingram LO. Engineering Escherichia coli for efficient conversion of glucose to pyruvate, Proc. Natl. Acad. Sci. U. S. A. 101 , pp. 22352240. 2004

331

You might also like