You are on page 1of 27

letters to nature

Received 7 July; accepted 21 September 1998.


1. Dalbey, R. E., Lively, M. O., Bron, S. & van Dijl, J. M. The chemistry and enzymology of the type 1
signal peptidases. Protein Sci. 6, 11291138 (1997).
2. Kuo, D. W. et al. Escherichia coli leader peptidase: production of an active form lacking a requirement
for detergent and development of peptide substrates. Arch. Biochem. Biophys. 303, 274280 (1993).
3. Tschantz, W. R. et al. Characterization of a soluble, catalytically active form of Escherichia coli leader
peptidase: requirement of detergent or phospholipid for optimal activity. Biochemistry 34, 39353941
(1995).
4. Allsop, A. E. et al. in Anti-Infectives, Recent Advances in Chemistry and Structure-Activity Relationships
(eds Bently, P. H. & O'Hanlon, P. J.) 6172 (R. Soc. Chem., Cambridge, 1997).
5. Black, M. T. & Bruton, G. Inhibitors of bacterial signal peptidases. Curr. Pharm. Des. 4, 133154
(1998).
6. Date, T. Demonstration by a novel genetic technique that leader peptidase is an essential enzyme in
Escherichia coli. J. Bacteriol. 154, 7683 (1983).
7. Whitely, P. & von Heijne, G. The DsbA-DsbB system affects the formation of disulde bonds in
periplasmic but not in intramembraneous protein domains. FEBS Lett. 332, 4951 (1993).
8. Peat, T. S. et al. Structure of the UmuD9 protein and its regulation in response to DNA damage. Nature
380, 727730 (1996).
9. Paetzel, M. et al. Crystallization of a soluble, catalytically active form of Escherichia coli leader
peptidase. Proteins Struct. Funct. Genet. 23, 122125 (1995).
10. van Klompenburg, W. et al. Phosphatidylethanolamine mediated insertion of the catalytic domain of
leader peptidase in membranes. FEBS Lett. 431, 7579 (1998).
11. Kim, Y. T., Muramatsu, T. & Takahashi, K. Identication of Trp 300 as an important residue for
Escherichia coli leader peptidase activity. Eur. J. Biochem. 234, 358362 (1995).
12. Landolt-Marticorena, C., Williams, K. A., Deber, C. M. & Reithmeirer, R. A. Non-random distribution of amino acids in the transmembrane segments of human type I single span membrane proteins.
J. Mol. Biol. 229, 602608 (1993).
13. James, M. N. G. in Proteolysis and Protein Turnover (eds Bond, J. S. & Barrett, A. J.) 18 (Portland,
Brookeld, VT, 1994).
14. Strynadka, N. C. J. et al. Molecular structure of the acyl-enzyme intermediate in b-lactamase at 1.7 A
resolution. Nature 359, 393400 (1992).
15. Manard, R. & Storer, A. C. Oxyanion hole interactions in serine and cysteine proteases. Biol. Chem.
Hoppe-Seyler 373, 393400 (1992).
16. Nicolas, A. et al. Contribution of cutinase Ser 42 side chain to the stabilization of the oxyanion
transition state. Biochemistry 35, 398410 (1996).
17. Paetzel, M. et al. Use of site-directed chemical modication to study an essential lysine in Escherichia
coli leader peptidase. J. Biol. Chem. 272, 999410003 (1997).
18. Paetzel, M. & Dalbey, R. E. Catalytic hydroxyl/amine dyads with serine proteases. Trends Biochem. Sci.
22, 2831 (1997).
19. von Heijne, G. Signal sequences. The limits of variation. J. Mol. Biol. 184, 99105 (1985).
20. Izard, J. W. & Kendall, D. A. Signal peptides: exquisitely designed transport promoters. Mol. Microbiol.
13, 765773 (1994).
21. Matthews, B. W. Solvent content of protein crystals. J. Mol. Biol. 33, 491497 (1968).
22. Otwinowski, Z. in DENZO (eds Sawyer, L., Isaacs, N. & Baily, S.) 5662 (SERC Daresbury Laboratory,
Warrington, UK, 1993).
23. Collaborative Computational Project No. 4 The CCP4 suite: programs for protein crystallography.
Acta Crystallogr. D 50, 760763 (1994).
24. Jones, T. A., Zou, J. Y., Cowan, S. W. & Kieldgaard, M. Improved methods for building protein models
in electron density maps and the location of errors in these models. Acta Crystallogr. A 47, 110119 (1991).
25. Brunger, A. T. X-PLOR: A System for X-ray Crystallography and NMR (Version 3.1) (Yale Univ. Press,
New Haven, 1987).

26. Tronrud, D. E. Conjugate-direction minimization: an improved method for the renement of


macromolecules. Acta Crystallogr. A 48, 912916 (1992).
27. Wolfe, P. B., Wickner, W. & Goodman, J. M. Sequence of the leader peptidase gene of Escherichia coli
and the orientation of leader peptidase in the bacterial envelope. J. Biol. Chem. 258, 1207312080
(1983).
28. Kraulis, P. G. Molscript: a program to produce both detailed and schematic plots of protein structures.
J. Appl. Crystallogr. 24, 946950 (1991).
29. Nicholls, A., Sharp, K. A. & Honig, B. Protein folding and association: insights from the interfacial and
the thermodynamic properties of hydrocarbons. Proteins Struct. Funct. Genet. 11, 281296 (1991).
30. Meritt, E. A. & Bacon, D. J. Raster3D: photorealistic molecular graphics. Methods Enzymol. 277, 505
524 (1997).
Acknowledgements. We thank SmithKlineBeecham Pharmaceuticals for penem inhibitor; R. M. Sweet
for use of beamline X12C (NSLS, Brookhaven National Laboratory); G. Petsko for the ethylmercury
phosphate; M. N. G. James for access to equipment for characterization of earlier crystal forms of SPase;
and S. Mosimann and S. Ness for discussions. This work was supported by the Medical Research Council
of Canada, the Canadian Bacterial Diseases Network of Excellence, and British Columbia Medical
Research Foundation grants to N.C.J.S. M.P. is funded by an MRC of Canada post-doctoral fellowship,
N.C.J.S. by an MRC of Canada scholarship, and R.E.D. by the NIH and the American Heart Association.
Correspondence and requests for materials should be addressed to N.C.J.S. (e-mail: natalie@byron.
biochem.ubc.ca).

errata

Reconciling the spectrum


of Sagittarius A* with a
two-temperature
plasma model
Rohan Mahadevan

Nature 394, 651653 (1998)


..................................................................................................................................
A misleading typographical error was introduced into the second
sentence of the bold introductory paragraph of this Letter: the word
``infrared'' should be ``inferred''.
M

Deciphering the biology of Mycobacterium tuberculosis from


the complete genome sequence
S. T. Cole, R. Brosch, J. Parkhill, T. Garnier, C. Churcher, D. Harris, S. V. Gordon, K. Eiglmeier, S. Gas, C. E. Barry III,
F. Tekaia, K. Badcock, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies, K. Devlin, T. Feltwell, S. Gentles,
N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, A. Krogh, J. McLean, S. Moule, L. Murphy, K. Oliver, J. Osborne, M. A. Quail,
M.-A. Rajandream, J. Rogers, S. Rutter, K. Seeger, J. Skelton, R. Squares, S. Squares, J. E. Sulston, K. Taylor, S. Whitehead
& B. G. Barrell

Nature
393, 537544 (1998)
..........................................................................................................................................................................................................................................................................
As a result of an error during lm output, Table 1 was published with some symbols missing. The correct version can be found at
http://www.sanger.ac.uk and is reproduced again here (following pages).
Also, in Fig. 2, we incorrectly labelled Rv0649 as fadD37 instead of fabD2. Two of the genes for mycolyl transferases were inverted:
Rv0129c encodes antigen 85C and not 85C9 as stated, whereas Rv3803c codes for the secreted protein MPT51 and not antigen 85C (Infect.
Immun. 59, 372382; 1991); Rv3803c is now designated fbpD. We thank Morten Harboe and Harald Wiker for drawing this to our
attention.
The sequence of Rv0746 from M. bovis BCG-Pasteur presented in Fig. 5b was incorrect and should have shown a 16-codon deletion
instead of 29, as indicated here:
H37Rv.....GSGAPGGAGGAAGLWGTGGAGGAGGSSAGGGGAGGAGGAGGWLLGDGGAGGIGGAST...
..........::::::::::::::::::::
:::::::::::::::::::::
BCG.......GSGAPGGAGGAAGLWGTGGA----------------GGAGGWLLGDGGAGGIGGAST...

190

Nature Macmillan Publishers Ltd 1998

NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com

letters to nature

NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com

Nature Macmillan Publishers Ltd 1998

191

letters to nature

192

Nature Macmillan Publishers Ltd 1998

NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com

letters to nature

NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com

Nature Macmillan Publishers Ltd 1998

193

letters to nature

194

Nature Macmillan Publishers Ltd 1998

NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com

letters to nature

NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com

Nature Macmillan Publishers Ltd 1998

195

letters to nature

196

Nature Macmillan Publishers Ltd 1998

NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com

letters to nature

NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com

Nature Macmillan Publishers Ltd 1998

197

letters to nature

198

Nature Macmillan Publishers Ltd 1998

NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com

article

Deciphering the biology of


Mycobacterium tuberculosis from
the complete genome sequence

S. T. Cole*, R. Brosch*, J. Parkhill, T. Garnier*, C. Churcher, D. Harris, S. V. Gordon*, K. Eiglmeier*, S. Gas*,


C. E. Barry III, F. Tekaia, K. Badcock, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies,
K. Devlin, T. Feltwell, S. Gentles, N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, A. Krogh, J. McLean,
S. Moule, L. Murphy, K. Oliver, J. Osborne, M. A. Quail, M.-A. Rajandream, J. Rogers, S. Rutter,
K. Seeger, J. Skelton, R. Squares, S. Squares, J. E. Sulston, K. Taylor, S. Whitehead & B. G. Barrell
Sanger Centre, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
* Unite de Genetique Moleculaire Bacterienne, and Unite de Genetique Moleculaire des Levures, Institut Pasteur, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France
Tuberculosis Research Unit, Laboratory of Intracellular Parasites, Rocky Mountain Laboratories, National Institute of Allergy and Infectious Diseases, National
Institutes of Health, Hamilton, Montana 59840, USA
Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
. ............ ............ ............ ........... ............ ............ ............ ........... ............ ............ ............ ........... ............ ............ ............ ........... ............ ............ ............ ............ ...........

Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus.
The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been
determined and analysed in order to improve our understanding of the biology of this slow-growing pathogen and to help
the conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs,
contains around 4,000 genes, and has a very high guanine + cytosine content that is reflected in the biased amino-acid
content of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its coding
capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of
glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation.

Despite the availability of effective short-course chemotherapy


(DOTS) and the Bacille Calmette-Guerin (BCG) vaccine, the
tubercle bacillus continues to claim more lives than any other
single infectious agent1. Recent years have seen increased incidence
of tuberculosis in both developing and industrialized countries, the
widespread emergence of drug-resistant strains and a deadly
synergy with the human immunodeficiency virus (HIV). In 1993,
the gravity of the situation led the World Health Organisation (WHO)
to declare tuberculosis a global emergency in an attempt to heighten
public and political awareness. Radical measures are needed now to
prevent the grim predictions of the WHO becoming reality. The
combination of genomics and bioinformatics has the potential to
generate the information and knowledge that will enable the
conception and development of new therapies and interventions
needed to treat this airborne disease and to elucidate the unusual
biology of its aetiological agent, Mycobacterium tuberculosis.
The characteristic features of the tubercle bacillus include its slow
growth, dormancy, complex cell envelope, intracellular pathogenesis and genetic homogeneity2. The generation time of M. tuberculosis, in synthetic medium or infected animals, is typically ,24
hours. This contributes to the chronic nature of the disease, imposes
lengthy treatment regimens and represents a formidable obstacle for
researchers. The state of dormancy in which the bacillus remains
quiescent within infected tissue may reflect metabolic shutdown
resulting from the action of a cell-mediated immune response that
can contain but not eradicate the infection. As immunity wanes,
through ageing or immune suppression, the dormant bacteria
reactivate, causing an outbreak of disease often many decades
after the initial infection3. The molecular basis of dormancy and
reactivation remains obscure but is expected to be genetically
programmed and to involve intracellular signalling pathways.
The cell envelope of M. tuberculosis, a Gram-positive bacterium
with a G + C-rich genome, contains an additional layer beyond the
peptidoglycan that is exceptionally rich in unusual lipids, glycoliNATURE | VOL 393 | 11 JUNE 1998

pids and polysaccharides4,5. Novel biosynthetic pathways generate


cell-wall components such as mycolic acids, mycocerosic acid,
phenolthiocerol, lipoarabinomannan and arabinogalactan, and
several of these may contribute to mycobacterial longevity, trigger
inflammatory host reactions and act in pathogenesis. Little is
known about the mechanisms involved in life within the macrophage, or the extent and nature of the virulence factors produced by
the bacillus and their contribution to disease.
It is thought that the progenitor of the M. tuberculosis complex,
comprising M. tuberculosis, M. bovis, M. bovis BCG, M. africanum
and M. microti, arose from a soil bacterium and that the human
bacillus may have been derived from the bovine form following the
domestication of cattle. The complex lacks interstrain genetic
diversity, and nucleotide changes are very rare6. This is important
in terms of immunity and vaccine development as most of the
proteins will be identical in all strains and therefore antigenic drift
will be restricted. On the basis of the systematic sequence analysis of
26 loci in a large number of independent isolates6, it was concluded
that the genome of M. tuberculosis is either unusually inert or that
the organism is relatively young in evolutionary terms.
Since its isolation in 1905, the H37Rv strain of M. tuberculosis has
found extensive, worldwide application in biomedical research
because it has retained full virulence in animal models of tuberculosis, unlike some clinical isolates; it is also susceptible to drugs and
amenable to genetic manipulation. An integrated map of the 4.4
megabase (Mb) circular chromosome of this slow-growing pathogen had been established previously and ordered libraries of
cosmids and bacterial artificial chromosomes (BACs) were
available7,8.
Organization and sequence of the genome

Sequence analysis. To obtain the contiguous genome sequence, a


combined approach was used that involved the systematic sequence
analysis of selected large-insert clones (cosmids and BACs) as well as

Nature Macmillan Publishers Ltd 1998

537

article
random small-insert clones from a whole-genome shotgun library.
This culminated in a composite sequence of 4,411,529 base pairs
(bp) (Figs 1, 2), with a G + C content of 65.6%. This represents the
second-largest bacterial genome sequence currently available (after
that of Escherichia coli)9. The initiation codon for the dnaA gene, a
hallmark for the origin of replication, oriC, was chosen as the start
point for numbering. The genome is rich in repetitive DNA,
particularly insertion sequences, and in new multigene families
and duplicated housekeeping genes. The G + C content is relatively
constant throughout the genome (Fig. 1) indicating that horizontally transferred pathogenicity islands of atypical base composition
are probably absent. Several regions showing higher than average G
+ C content (Fig. 1) were detected; these correspond to sequences
belonging to a large gene family that includes the polymorphic G +
C-rich sequences (PGRSs).
Genes for stable RNA. Fifty genes coding for functional RNA
molecules were found. These molecules were the three species
produced by the unique ribosomal RNA operon, the 10Sa RNA
involved in degradation of proteins encoded by abnormal messenger RNA, the RNA component of RNase P, and 45 transfer RNAs.
No 4.5S RNA could be detected. The rrn operon is situated
unusually as it occurs about 1,500 kilobases (kb) from the putative
oriC; most eubacteria have one or more rrn operons near to oriC to
exploit the gene-dosage effect obtained during replication10. This
arrangement may be related to the slow growth of M. tuberculosis.
The genes encoding tRNAs that recognize 43 of the 61 possible sense
codons were distributed throughout the genome and, with one

0
4

M. tuberculosis
H37Rv
4,411,529 bp

exception, none of these uses A in the first position of the anticodon,


indicating that extensive wobble occurs during translation. This is
consistent with the high G + C content of the genome and the
consequent bias in codon usage. Three genes encoding tRNAs for
methionine were found; one of these genes (metV) is situated in a
region that may correspond to the terminus of replication (Figs 1,
2). As metV is linked to defective genes for integrase and excisionase,
perhaps it was once part of a phage or similar mobile genetic
element.
Insertion sequences and prophages. Sixteen copies of the promiscuous insertion sequence IS6110 and six copies of the more stable
element IS1081 reside within the genome of H37Rv8. One copy of
IS1081 is truncated. Scrutiny of the genomic sequence led to the
identification of a further 32 different insertion sequence elements,
most of which have not been described previously, and of the 13E12
family of repetitive sequences which exhibit some of the characteristics of mobile genetic elements (Fig. 1). The newly discovered
insertion sequences belong mainly to the IS3 and IS256 families,
although six of them define a new group. There is extensive
similarity between IS1561 and IS1552 with insertion sequence
elements found in Nocardia and Rhodococcus spp., suggesting that
they may be widely disseminated among the actinomycetes.
Most of the insertion sequences in M. tuberculosis H37Rv appear
to have inserted in intergenic or non-coding regions, often near
tRNA genes (Fig. 1). Many are clustered, suggesting the existence of
insertional hot-spots that prevent genes from being inactivated, as
has been described for Rhizobium11. The chromosomal distribution
of the insertion sequences is informative as there appears to have
been a selection against insertions in the quadrant encompassing
oriC and an overrepresentation in the direct repeat region that
contains the prototype IS6110. This bias was also observed experimentally in a transposon mutagenesis study12.
At least two prophages have been detected in the genome
sequence and their presence may explain why M. tuberculosis
shows persistent low-level lysis in culture. Prophages phiRv1 and
phiRv2 are both ,10 kb in length and are similarly organized, and
some of their gene products show marked similarity to those
encoded by certain bacteriophages from Streptomyces and saprophytic mycobacteria. The site of insertion of phiRv1 is intriguing as
it corresponds to part of a repetitive sequence of the 13E12 family
that itself appears to have integrated into the biotin operon. Some
strains of M. tuberculosis have been described as requiring biotin as a
growth supplement, indicating either that phiRv1 has a polar effect
on expression of the distal bio genes or that aberrant excision,
leading to mutation, may occur. During the serial attenuation of M.
bovis that led to the vaccine strain M. bovis BCG, the phiRv1
prophage was lost13. In a systematic study of the genomic diversity
of prophages and insertion sequences (S.V.G. et al., manuscript in
preparation), only IS1532 exhibited significant variability, indicating that most of the prophages and insertion sequences are currently
stable. However, from these combined observations, one can conclude that horizontal transfer of genetic material into the free-living
ancestor of the M. tuberculosis complex probably occurred in nature
before the tubercle bacillus adopted its specialized intracellular
niche.

Figure 1 Circular map of the chromosome of M. tuberculosis H37Rv. The outer


circle shows the scale in Mb, with 0 representing the origin of replication. The first
ring from the exterior denotes the positions of stable RNA genes (tRNAs are blue,

Figure 2 Linear map of the chromosome of M. tuberculosis H37Rv showing the

others are pink) and the direct repeat region (pink cube); the second ring inwards

position and orientation of known genes and coding sequences (CDS). We used

shows the coding sequence by strand (clockwise, dark green; anticlockwise, light

the following functional categories (adapted from ref. 20): lipid metabolism

green); the third ring depicts repetitive DNA (insertion sequences, orange; 13E12

(black); intermediary metabolism and respiration (yellow); information pathways

REP family, dark pink; prophage, blue); the fourth ring shows the positions of the

(pink); regulatory proteins (sky blue); conserved hypothetical proteins (orange);

PPE family members (green); the fifth ring shows the PE family members (purple,

proteins of unknown function (light green); insertion sequences and phage-

excluding PGRS); and the sixth ring shows the positions of the PGRS sequences

related functions (blue); stable RNAs (purple); cell wall and cell processes (dark

(dark red). The histogram (centre) represents G + C content, with ,65% G + C in

green); PE and PPE protein families (magenta); virulence, detoxification and

yellow, and .65% G + C in red. The figure was generated with software from

adaptation (white). For additional information about gene functions, refer to

DNASTAR.

http://www.sanger.ac.uk.

538

Nature Macmillan Publishers Ltd 1998

NATURE | VOL 393 | 11 JUNE 1998

article
Genes encoding proteins. 3,924 open reading frames were identified in the genome (see Methods), accounting for ,91% of the
potential coding capacity (Figs 1, 2). A few of these genes appear to
have in-frame stop codons or frameshift mutations (irrespective of
the source of the DNA sequenced) and may either use frameshifting
during translation or correspond to pseudogenes. Consistent with
the high G + C content of the genome, GTG initiation codons (35%)
are used more frequently than in Bacillus subtilis (9%) and E. coli
(14%), although ATG (61%) is the most common translational
start. There are a few examples of atypical initiation codons, the
most notable being the ATC used by infC, which begins with ATT in
both B. subtilis and E. coli9,14. There is a slight bias in the orientation
of the genes (Fig. 1) with respect to the direction of replication as
,59% are transcribed with the same polarity as replication,
compared with 75% in B. subtilis. In other bacteria, genes transcribed in the same direction as the replication forks are believed to
be expressed more efficiently9,14. Again, the more even distribution
in gene polarity seen in M. tuberculosis may reflect the slow growth
and infrequent replication cycles. Three genes (dnaB, recA and
Rv1461) have been invaded by sequences encoding inteins (protein
introns) and in all three cases their counterparts in M. leprae also
contain inteins, but at different sites15 (S.T.C. et al., unpublished
observations).
Protein function, composition and duplication. By using various
database comparisons, we attributed precise functions to ,40% of
the predicted proteins and found some information or similarity for
another 44%. The remaining 16% resembled no known proteins
and may account for specific mycobacterial functions. Examination
of the amino-acid composition of the M. tuberculosis proteome by
correspondence analysis16, and comparison with that of other
microorganisms whose genome sequences are available, revealed a
statistically significant preference for the amino acids Ala, Gly, Pro,
Arg and Trp, which are all encoded by G + C-rich codons, and a
comparative reduction in the use of amino acids encoded by A + Trich codons such as Asn, Ile, Lys, Phe and Tyr (Fig. 3). This approach
also identified two groups of proteins rich in Asn or Gly that belong
to new families, PE and PPE (see below). The fraction of the
proteome that has arisen through gene duplication is similar to
that seen in E. coli or B. subtilis (,51%; refs 9, 14), except that the
level of sequence conservation is considerably higher, indicating
that there may be extensive redundancy or differential production
of the corresponding polypeptides. The apparent lack of divergence
following gene duplication is consistent with the hypothesis that
M. tuberculosis is of recent descent6.
General metabolism, regulation and drug resistance

Metabolic pathways. From the genome sequence, it is clear that the


tubercle bacillus has the potential to synthesize all the essential
amino acids, vitamins and enzyme co-factors, although some of the
pathways involved may differ from those found in other bacteria. M.
tuberculosis can metabolize a variety of carbohydrates, hydrocarbons, alcohols, ketones and carboxylic acids2,17. It is apparent from
genome inspection that, in addition to many functions involved in
lipid metabolism, the enzymes necessary for glycolysis, the pentose
phosphate pathway, and the tricarboxylic acid and glyoxylate cycles
are all present. A large number (,200) of oxidoreductases, oxygenases and dehydrogenases is predicted, as well as many oxygenases
containing cytochrome P450, that are similar to fungal proteins
involved in sterol degradation. Under aerobic growth conditions,
ATP will be generated by oxidative phosphorylation from electron
transport chains involving a ubiquinone cytochrome b reductase
complex and cytochrome c oxidase. Components of several anaerobic phosphorylative electron transport chains are also present,
including genes for nitrate reductase (narGHJI), fumarate reductase
(frdABCD) and possibly nitrite reductase (nirBD), as well as a new
reductase (narX) that results from a rearrangement of a homologue
of the narGHJI operon. Two genes encoding haemoglobin-like
NATURE | VOL 393 | 11 JUNE 1998

proteins, which may protect against oxidative stress or be involved


in oxygen capture, were found. The ability of the bacillus to adapt its
metabolism to environmental change is significant as it not only has
to compete with the lung for oxygen but must also adapt to the
microaerophilic/anaerobic environment at the heart of the
burgeoning granuloma.
Regulation and signal transduction. Given the complexity of the
environmental and metabolic choices facing M. tuberculosis, an
extensive regulatory repertoire was expected. Thirteen putative
sigma factors govern gene expression at the level of transcription
initiation, and more than 100 regulatory proteins are predicted
(Table 1). Unlike B. subtilis and E. coli, in which there are .30 copies
of different two-component regulatory systems14, M. tuberculosis
has only 11 complete pairs of sensor histidine kinases and response
regulators, and a few isolated kinase and regulatory genes. This
relative paucity in environmental signal transduction pathways is
probably offset by the presence of a family of eukaryotic-like serine/
threonine protein kinases (STPKs), which function as part of a
phosphorelay system18. The STPKs probably have two domains: the
well-conserved kinase domain at the amino terminus is predicted to
be connected by a transmembrane segment to the carboxy-terminal
region that may respond to specific stimuli. Several of the predicted
envelope lipoproteins, such as that encoded by lppR (Rv2403), show
extensive similarity to this putative receptor domain of STPKs,
suggesting possible interplay. The STPKs probably function in
signal transduction pathways and may govern important cellular
decisions such as dormancy and cell division, and although their
partners are unknown, candidate genes for phosphoprotein phosphatases have been identified.
Drug resistance. M. tuberculosis is naturally resistant to many
antibiotics, making treatment difficult19. This resistance is due
mainly to the highly hydrophobic cell envelope acting as a permeability barrier4, but many potential resistance determinants are also
encoded in the genome. These include hydrolytic or drug-modifying enzymes such as b-lactamases and aminoglycoside acetyl
transferases, and many potential drugefflux systems, such as 14
members of the major facilitator family and numerous ABC
transporters. Knowledge of these putative resistance mechanisms
will promote better use of existing drugs and facilitate the conception of new therapies.

F2 -22.6%
Glu

Mj
0.1

Tyr

Gly

Val

Ile

Lys

Af
Mth

Ae

Bs

Bb

Arg

Met

Mt

Cys
Asp

Hp
Phe

-0.1

Leu

Sc
Ser

Pro

Hi
Ce

His

Ec
Ss

Thr

Mg

Ala

Trp

Mp

Asn

-0.2

-0.3

Gln

-0.30

-0.15

0.15

0.30

F1 - 55.2%

Figure 3 Correspondence analysis of the proteomes from extensively


sequenced organisms as a function of amino-acid composition. Note the
extreme position of M. tuberculosis and the shift in amino-acid preference
reflecting increasing G + C content from left to right. Abbreviations used: Ae,
Aquifex aeolicus; Af, Archaeoglobus fulgidis; Bb, Borrelia burgdorfei; Bs, B.
subtilis; Ce, Caenorhabditis elegans; Ec, E. coli; Hi, Haemophilus influenzae; Hp,
Helicobacter pylori; Mg, Mycoplasma genitalium; Mj, Methanococcus jannaschi;
Mp, Mycoplasma pneumoniae; Mt, M. tuberculosis; Mth, Methanobacterium
thermoautotrophicum; Sc, Saccharomyces cerevisiae; Ss, Synechocystis sp.
strain PCC6803. F1 and F2, first and second factorial axes16.

Nature Macmillan Publishers Ltd 1998

539

article
Figure 4 Lipid metabolism. a, Degradation of

a
OH

host-cell lipids is vital in the intracellular life of

SCoA

echA1-21

SCoA

-Oxidation
(fadA/fadB)

fadE1-36

Citric
acid
cycle

accA1/accD1
accA2/accD2
O

fadD1-36

as well as potential precursors of mycobac-

Purines,
Pyrimidines

Glyoxylate
shunt

SCoA

precursors for many metabolic processes,

NADH, FADH 2

SCoA

SCoA

fadA2-6

O
R

ATP

fadB2-5

M. tuberculosis. Host-cell membranes provide

terial cell-wall constituents, through the


actions of a broad family of b-oxidative

Porphyrins

enzymes encoded by multiple copies in the

Amino acids

genome. These enzymes produce acetyl

Glucose

CoA, which can be converted into many


different metabolites and fuel for the bacteria

S Co A

through the actions of the enzymes of the

CO2

citric acid cycle and the glyoxylate shunt of

OH

this cycle. b, The genes that synthesize

Cell wall and mycobacterial lipids

mycolic acids, the dominant lipid component


of the mycobacterial cell wall, include the
type I fatty acid synthase (fas) and a unique

Host membranes

type II system which relies on extension of a


precursor bound to an acyl carrier protein to

form full-length (,80-carbon) mycolic acids.


The

H
OH
O
H

OH

fas

cma

genes

are

responsible

for

cyclopropanation. c, The genes that produce


phthiocerol dimycocerosate form a large
operon and represent type I (mas) and type
II (the pps operon) polyketide synthase
systems. Functions are colour coordinated.

fabD

mabA

acpM

kasA

kasB

8 (kb)

accD

inhA

cmaA1

Gene

Function

fas

Fatty acid synthase, produces C16 -C 26 acyl-CoA esters


Malonyl-CoA:AcpM acyltransferase, AcpM loading
Acyl carrier protein, meromycolate precursor transport
Ketoacyl acyl carrier protein synthase, chain elongation
Ketoacyl acyl carrier protein synthase, chain elongation
Acetyl-CoA carboxylase, malonyl-CoA synthesis
3-oxo-acyl-acyl carrier protein reductase, reduces KasA/B product
Enoyl-acyl carrier protein reductase
cyclopropane mycolic acid synthase 1, distal-position specific
cyclopropane mycolic acid synthase 2, proximal-position specific

fabD
acpM
kasA
kasB
accD
mabA
inhA
cmaA1
cmaA2

cmaA2

c
CH 3
CH 3
O

ppsA

ppsB

ppsC

ppsD

ppsE

CH 3
CH 3

mas

CH 3
CH 3

10

20

30

OCH 3
CH 3

CH 3
CH 3
CH 3

40 (kb)

Gene

Function

mas

Four rounds extension of C 16,18 using Me-malonyl CoA

ppsA

Extension of C 18 with malony CoA, partial reduction

ppsB

Extension with malony CoA, partial reduction

ppsC

Extension with malony CoA, complete reduction

ppsD

Extension with Me-malony CoA, partial reduction

ppsE

Extension with malony CoA, partial reduction, decarboxylation

540

O
O

Nature Macmillan Publishers Ltd 1998

NATURE | VOL 393 | 11 JUNE 1998

article
Lipid metabolism

Very few organisms produce such a diverse array of lipophilic


molecules as M. tuberculosis. These molecules range from simple
fatty acids such as palmitate and tuberculostearate, through isoprenoids, to very-long-chain, highly complex molecules such as
mycolic acids and the phenolphthiocerol alcohols that esterify with
mycocerosic acid to form the scaffold for attachment of the mycosides. Mycobacteria contain examples of every known lipid and
polyketide biosynthetic system, including enzymes usually found in
mammals and plants as well as the common bacterial systems. The
biosynthetic capacity is overshadowed by the even more remarkable
radiation of degradative, fatty acid oxidation systems and, in total,
there are ,250 distinct enzymes involved in fatty acid metabolism
in M. tuberculosis compared with only 50 in E. coli20.
Fatty acid degradation. In vivo-grown mycobacteria have been
suggested to be largely lipolytic, rather than lipogenic, because of
the variety and quantity of lipids available within mammalian cells
and the tubercle2 (Fig. 4a). The abundance of genes encoding
components of fatty acid oxidation systems found by our genomic
approach supports this proposition, as there are 36 acyl-CoA
synthases and a family of 36 related enzymes that could catalyse
the first step in fatty acid degradation. There are 21 homologous
enzymes belonging to the enoyl-CoA hydratase/isomerase superfamily of enzymes, which rehydrate the nascent product of the acylCoA dehydrogenase. The four enzymes that convert the 3-hydroxy
fatty acid into a 3-keto fatty acid appear less numerous, mainly

a
Representatives of the PE family
PE

~110 aa
PE

~110 aa

PGRS

(GGAGGA) n

0 .. >1,400 aa
unique sequence

0 .. ~500 aa

Representatives of the PPE family


PPE

~180 aa
PPE

~180 aa
PPE

~180 aa

MPTR

(NxGxGNxG) n

~200 .. >3,500 aa
GxxSVPxxW

~200 .. ~400 aa
unique sequence

0 .. ~400 aa

b
PE-PGRS sequence variation in ORF Rv0746
H37Rv
PE

29 aa
PGRS

BCG
PE

PGRS
46 aa

H37Rv .. ...GSGAPGGAGGAAGLWGTGGAGGAGGSSAGGGGAGGAGGAGGWLLGDGGAGGIGGAST
..........
:::::::::::::::::
.............................
:::::::::::
...GSGAPGGAGGAAGLWGT-----------------------------GGAGGIGGAST
BCG ....
H37Rv .. VLGGTGGGGGVGGLWGAGGAGGAGGTGLVGGDGGAGGAGGTGGLLAGLIGAGGGHGGTGG
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
VLGGTGGGGGVGGLWGAGGAGGAGGTGLVGGDGGAGGAGGTGGLLAGLIGAGGGHGGTGG
BCG ....

.......

H37Rv .. LSTNGDGGVGGAGGNAGMLAGPGGAGGAGGDGENLDTGGDGGAGGSAGLLFGSGGAGGAG
.......
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
LSTNGDGGVGGAGGNAGMLAGPGGAGGAGGDGENLDTGGDGGAGGSAGLLFGSGGAGGAG
BCG ....
H37Rv .. GFGFLGGDGGAGGNAGLLLSSGGAGGFGGFGTAGGVGGAGGNAGWLGF-----------::::::::::::::::::::::::::::::::::::::::::::::::
GFGFLGGDGGAGGNAGLLLSSGGAGGFGGFGTAGGVGGAGGNAGWLGFGGAGGIGGIGGN
BCG ....

.......

H37Rv .. ----------------------------------GGAGGVGGSAGLIGTGGNGGNGGTGA
.......::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::
BCG ....
ANGGAGGNGGTGGQLWGSGGAGGEGGAALSVGDTGGAGGVGGSAGLIGTGGNGGNGGTGA
H37Rv .. NAGSPGTGGAGGLLLGQNGLNGLP*
::::::::::::::::::::::::
NAGSPGTGGAGGLLLGQNGLNGLP*
BCG ....

.......

Figure 5 The PE and PPE protein families. a, Classification of the PE and PPE
protein families. b, Sequence variation between M. tuberculosis H37Rv and M.
bovis BCG-Pasteur in the PE-PGRS encoded by open reading frame (ORF)
Rv0746.
NATURE | VOL 393 | 11 JUNE 1998

because they are difficult to distinguish from other members of the


short-chain alcohol dehydrogenase family on the basis of primary
sequence. The five enzymes that complete the cycle by thiolysis of
the b-ketoester, the acetyl-CoA C-acetyltransferases, do indeed
appear to be a more limited family. In addition to this extensive
set of dissociated degradative enzymes, the genome also encodes the
canonical FadA/FadB b-oxidation complex (Rv0859 and Rv0860).
Accessory activities are present for the metabolism of odd-chain and
multiply unsaturated fatty acids.
Fatty acid biosynthesis. At least two discrete types of enzyme
system, fatty acid synthase (FAS) I and FAS II, are involved in
fatty acid biosynthesis in mycobacteria (Fig. 4b). FAS I (Rv2524, fas)
is a single polypeptide with multiple catalytic activities that generates several shorter CoA esters from acetyl-CoA primers5 and
probably creates precursors for elongation by all of the other fatty
acid and polyketide systems. FAS II consists of dissociable enzyme
components which act on a substrate bound to an acyl-carrier
protein (ACP). FAS II is incapable of de novo fatty acid synthesis but
instead elongates palmitoyl-ACP to fatty acids ranging from 24 to
56 carbons in length17,21. Several different components of FAS II may
be targets for the important tuberculosis drug isoniazid, including
the enoyl-ACP reductase InhA22, the ketoacyl-ACP synthase KasA
and the ACP AcpM21. Analysis of the genome shows that there are
only three potential ketoacyl synthases: KasA and KasB are highly
related, and their genes cluster with acpM, whereas KasC is a more
distant homologue of a ketoacyl synthase III system. The number of
ketoacyl synthase and ACP genes indicates that there is a single FAS
II system. Its genetic organization, with two clustered ketoacyl
synthases, resembles that of type II aromatic polyketide biosynthetic
gene clusters, such as those for actinorhodin, tetracycline and
tetracenomycin in Streptomyces species23. InhA seems to be the sole
enoyl-ACP reductase and its gene is co-transcribed with a fabG
homologue, which encodes 3-oxoacyl-ACP reductase. Both of these
proteins are probably important in the biosynthesis of mycolic acids.
Fatty acids are synthesized from malonyl-CoA and precursors are
generated by the enzymatic carboxylation of acetyl (or propionyl)CoA by a biotin-dependent carboxylase (Fig. 4b). From study of the
genome we predict that there are three complete carboxylase
systems, each consisting of an a- and a b-subunit, as well as three
b-subunits without an a-counterpart. As a group, all of the
carboxylases seem to be more related to the mammalian homologues than to the corresponding bacterial enzymes. Two of these
carboxylase systems (accA1, accD1 and accA2, accD2) are probably
involved in degradation of odd-numbered fatty acids, as they are
adjacent to genes for other known degradative enzymes. They may
convert propionyl-CoA to succinyl-CoA, which can then be incorporated into the tricarboxylic acid cycle. The synthetic carboxylases
(accA3, accD3, accD4, accD5 and accD6) are more difficult to
understand. The three extra b-subunits might direct carboxylation
to the appropriate precursor or may simply increase the total
amount of carboxylated precursor available if this step were ratelimiting.
Synthesis of the paraffinic backbone of fatty and mycolic acids in
the cell is followed by extensive postsynthetic modifications and
unsaturations, particularly in the case of the mycolic acids24,25.
Unsaturation is catalysed either by a FabA-like b-hydroxyacylACP dehydrase, acting with a specific ketoacyl synthase, or by an
aerobic terminal mixed function desaturase that uses both molecular oxygen and NADPH. Inspection of the genome revealed no
obvious candidates for the FabA-like activity. However, three
potential aerobic desaturases (encoded by desA1, desA2 and
desA3) were evident that show little similarity to related vertebrate
or yeast enzymes (which act on CoA esters) but instead resemble
plant desaturases (which use ACP esters). Consequently, the genomic data indicate that unsaturation of the meromycolate chain may
occur while the acyl group is bound to AcpM.
Much of the subsequent structural diversity in mycolic acids is

Nature Macmillan Publishers Ltd 1998

541

article
generated by a family of S-adenosyl-L-methionine-dependent
enzymes, which use the unsaturated meromycolic acid as a substrate
to generate cis and trans cyclopropanes and other mycolates. Six
members of this family have been identified and characterized25 and
two clustered, convergently transcribed new genes are evident in the
genome (umaA1 and umaA2). From the functions of the known
family members and the structures of mycolic acids in M. tuberculosis, it is tempting to speculate that these new enzymes may
introduce the trans cyclopropanes into the meromycolate precursor.
In addition to these two methyltransferases, there are two other
unrelated lipid methyltransferases (Ufa1 and Ufa2) that share
homology with cyclopropane fatty acid synthase of E. coli25.
Although cyclopropanation seems to be a relatively common
modification of mycolic acids, cyclopropanation of plasma-membrane constituents has not been described in mycobacteria. Tuberculostearic acid is produced by methylation of oleic acid, and may
be synthesized by one of these two enzymes.
Condensation of the fully functionalized and preformed meromycolate chain with a 26-carbon a-branch generates full-length
mycolic acids that must be transported to their final location for
attachment to the cell-wall arabinogalactan. The transfer and
subsequent transesterification is mediated by three well-known
immunogenic proteins of the antigen 85 complex26. The genome
encodes a fourth member of this complex, antigen 85C9 ( fbpC2,
Rv0129), which is highly related to antigen 85C. Further studies are
needed to show whether the protein possesses mycolytransferase
activity and to clarify the reason behind the apparent redundancy.
Polyketide synthesis. Mycobacteria synthesize polyketides by several different mechanisms. A modular type I system, similar to that
involved in erythromycin biosynthesis23, is encoded by a very large
operon, ppsABCDE, and functions in the production of
phenolphthiocerol5. The absence of a second type I polyketide
synthase suggests that the related lipids phthiocerol A and B,
phthiodiolone A and phthiotriol may all be synthesized by the
same system, either from alternative primers or by differential
postsynthetic modification. It is physiologically significant that
the pps gene cluster occurs immediately upstream of mas, which
encodes the multifunctional enzyme mycocerosic acid synthase
(MAS), as their products phthiocerol and mycocerosic acid esterify
to form the very abundant cell-wall-associated molecule phthiocerol dimycocerosate (Fig. 4c).
Members of another large group of polyketide synthase enzymes
are similar to MAS, which also generates the multiply methylbranched fatty acid components of mycosides and phthiocerol
dimycocerosate, abundant cell-wall-associated molecules5. Although
some of these polyketide synthases may extend type I FAS CoA
primers to produce other long-chain methyl-branched fatty acids
such as mycolipenic, mycolipodienic and mycolipanolic acids or the
phthioceranic and hydroxyphthioceranic acids, or may even show
functional overlap5, there are many more of these enzymes than
there are known metabolites. Thus there may be new lipid and
polyketide metabolites that are expressed only under certain conditions, such as during infection and disease.
A fourth class of polyketide synthases is related to the plant
enzyme superfamily that includes chalcone and stilbene synthase23.
These polyketide synthases are phylogenetically divergent from all
other polyketide and fatty acid synthases and generate unreduced
polyketides that are typically associated with anthocyanin pigments
and flavonoids. The function of these systems, which are often
linked to apparent type I modules, is unknown. An example is the
gene cluster spanning pks10, pks7, pks8 and pks9, which includes two
of the chalcone-synthase-like enzymes and two modules of an
apparent type I system. The unknown metabolites produced by
these enzymes are interesting because of the potent biological
activities of some polyketides such as the immunosuppressor
rapamycin.
Siderophores. Peptides that are not ribosomally synthesized are
542

made by a process that is mechanistically analogous to polyketide


synthesis23,27. These peptides include the structurally related ironscavenging siderophores, the mycobactins and the exochelins2,28,
which are derived from salicylate by the addition of serine (or
threonine), two lysines and various fatty acids and possible polyketide segments. The mbt operon, encoding one apparent salicylateactivating protein, three amino-acid ligases, and a single module of
a type I polyketide synthase, may be responsible for the biosynthesis
of the mycobacterial siderophores. The presence of only one nonribosomal peptide-synthesis system indicates that this pathway may
generate both siderophores and that subsequent modification of a
single e-amino group of one lysine residue may account for the
different physical properties and function of the siderophores28.
Immunological aspects and pathogenicity

Given the scale of the global tuberculosis burden, vaccination is not


only a priority but remains the only realistic public health intervention that is likely to affect both the incidence and the prevalence
of the disease29. Several areas of vaccine development are promising,
including DNA vaccination, use of secreted or surface-exposed
proteins as immunogens, recombinant forms of BCG and rational
attenuation of M. tuberculosis29. All of these avenues of research will
benefit from the genome sequence as its availability will stimulate
more focused approaches. Genes encoding ,90 lipoproteins were
identified, some of which are enzymes or components of transport
systems, and a similar number of genes encoding preproteins (with
type I signal peptides) that are probably exported by the Secdependent pathway. M. tuberculosis seems to have two copies of
secA. The potent T-cell antigen Esat-6 (ref. 30), which is probably
secreted in a Sec-independent manner, is encoded by a member of a
multigene family. Examination of the genetic context reveals several
similarly organized operons that include genes encoding large ATPhydrolysing membrane proteins that might act as transporters. One
of the surprises of the genome project was the discovery of two
extensive families of novel glycine-rich proteins, which may be of
immunological significance as they are predicted to be abundant
and potentially polymorphic antigens.
The PE and PPE multigene families. About 10% of the coding
capacity of the genome is devoted to two large unrelated families of
acidic, glycine-rich proteins, the PE and PPE families, whose genes
are clustered (Figs 1, 2) and are often based on multiple copies of the
polymorphic repetitive sequences referred to as PGRSs, and major
polymorphic tandem repeats (MPTRs), respectively31,32. The names
PE and PPE derive from the motifs ProGlu (PE) and ProPro
Glu (PPE) found near the N terminus in most cases33. The 99
members of the PE protein family all have a highly conserved Nterminal domain of ,110 amino-acid residues that is predicted to
have a globular structure, followed by a C-terminal segment that
varies in size, sequence and repeat copy number (Fig. 5). Phylogenetic analysis separated the PE family into several subfamilies. The
largest of these is the highly repetitive PGRS class, which contains 61
members; members of the other subfamilies, share very limited
sequence similarity in their C-terminal domains (Fig. 5). The
predicted molecular weights of the PE proteins vary considerably
as a few members contain only the N-terminal domain, whereas
most have C-terminal extensions ranging in size from 100 to 1,400
residues. The PGRS proteins have a high glycine content (up to
50%), which is the result of multiple tandem repetitions of Gly
GlyAla or GlyGlyAsn motifs, or variations thereof.
The 68 members of the PPE protein family (Fig. 5) also have a
conserved N-terminal domain that comprises ,180 amino-acid
residues, followed by C-terminal segments that vary markedly in
sequence and length. These proteins fall into at least three groups,
one of which constitutes the MPTR class characterized by the
presence of multiple, tandem copies of the motif AsnXGlyX
GlyAsnXGly. The second subgroup contains a characteristic,
well-conserved motif around position 350, whereas the third contains

Nature Macmillan Publishers Ltd 1998

NATURE | VOL 393 | 11 JUNE 1998

article
proteins that are unrelated except for the presence of the common
180-residue PPE domain.
The subcellular location of the PE and PPE proteins is unknown
and in only one case, that of a lipase (Rv3097), has a function been
demonstrated. On examination of the protein database from the
extensively sequenced M. leprae15, no PGRS- or MPTR-related
polypeptides were detected but a few proteins belonging to the
non-MPTR subgroup of the PPE family were found. These proteins
include one of the major antigens recognized by leprosy patients,
the serine-rich antigen34. Although it is too early to attribute
biological functions to the PE and PPE families, it is tempting to
speculate that they could be of immunological importance. Two
interesting possibilities spring to mind. First, they could represent
the principal source of antigenic variation in what is otherwise a
genetically and antigenically homogeneous bacterium. Second,
these glycine-rich proteins might interfere with immune responses
by inhibiting antigen processing.
Several observations and results support the possibility of antigenic variation associated with both the PE and the PPE family
proteins. The PGRS member Rv1759 is a fibronectin-binding
protein of relative molecular mass 55,000 (ref. 35) that elicits a
variable antibody response, indicating either that individuals
mount different immune responses or that this PGRS protein
may vary between strains of M. tuberculosis. The latter possibility
is supported by restriction fragment length polymorphisms for
various PGRS and MPTR sequences in clinical isolates33. Direct
support for genetic variation within both the PE and the PPE
families was obtained by comparative DNA sequence analysis (Fig.
5). The gene for the PEPGRS protein Rv0746 of BCG differs from
that in H37Rv by the deletion of 29 codons and the insertion of 46
codons. Similar variation was seen in the gene for the PPE protein
Rv0442 (data not shown). As these differences were all associated
with repetitive sequences they could have resulted from intergenic
or intragenic recombinational events or, more probably, from
strand slippage during replication32. These mechanisms are
known to generate antigenic variability in other bacterial
pathogens36.
There are several parallels between the PGRS proteins and the
EpsteinBarr virus nuclear antigens (EBNAs). Members of both
polypeptide families are glycine-rich, contain extensive GlyAla
repeats, and exhibit variation in the length of the repeat region
between different isolates. The GlyAla repeat region of EBNA1
functions as a cis-acting inhibitor of the ubiquitin/proteasome
antigen-processing pathway that generates peptides presented in
the context of major histocompatibility complex (MHC) class I
molecules37,38. MHC class I knockout mice are very susceptible to M.
tuberculosis, underlining the importance of a cytotoxic T-cell
response in protection against disease3,39. Given the many potential
effects of the PPE and PE proteins, it is important that further
studies are performed to understand their activity. If extensive
antigenic variability or reduced antigen presentation were indeed
found, this would be significant for vaccine design and for understanding protective immunity in tuberculosis, and might even
explain the varied responses seen in different BCG vaccination
programmes40.
Pathogenicity. Despite intensive research efforts, there is little
information about the molecular basis of mycobacterial virulence41.
However, this situation should now change as the genome sequence
will accelerate the study of pathogenesis as never before, because
other bacterial factors that may contribute to virulence are becoming apparent. Before the completion of the genome sequence, only
three virulence factors had been described41: catalase-peroxidase,
which protects against reactive oxygen species produced by the
phagocyte; mce, which encodes macrophage-colonizing factor42;
and a sigma factor gene, sigA (aka rpoV), mutations in which can
lead to attenuation41. In addition to these single-gene virulence
factors, the mycobacterial cell wall4 is also important in pathology,
NATURE | VOL 393 | 11 JUNE 1998

but the complex nature of its biosynthesis makes it difficult to


identify critical genes whose inactivation would lead to attenuation.
On inspection of the genome sequence, it was apparent that four
copies of mce were present and that these were all situated in
operons, comprising eight genes, organized in exactly the same
manner. In each case, the genes preceding mce code for integral
membrane proteins, whereas mce and the following five genes are all
predicted to encode proteins with signal sequences or hydrophobic
stretches at the N terminus. These sets of proteins, about which little
is known, may well be secreted or surface-exposed; this is consistent
with the proposed role of Mce in invasion of host cells42. Furthermore, a homologue of smpB, which has been implicated in intracellular survival of Salmonella typhimurium, has also been
identified43. Among the other secreted proteins identified from
the genome sequence that could act as virulence factors are a
series of phospholipases C, lipases and esterases, which might
attack cellular or vacuolar membranes, as well as several proteases.
One of these phospholipases acts as a contact-dependent haemolysin (N. Stoker, personal communication). The presence of storage
proteins in the bacillus, such as the haemoglobin-like oxygen
captors described above, points to its ability to stockpile essential
growth factors, allowing it to persist in the nutrient-limited environment of the phagosome. In this regard, the ferritin-like proteins,
encoded by bfrA and bfrB, may be important in intracellular survival
as the capacity to acquire enough iron in the vacuole is very
M
limited.

.........................................................................................................................

Methods

Sequence analysis. Initially, ,3.2 Mb of sequence was generated from


cosmids8 and the remainder was obtained from selected BAC clones7 and
45,000 whole-genome shotgun clones. Sheared fragments (1.42.0 kb) from
cosmids and BACs were cloned into M13 vectors, whereas genomic DNA was
cloned in pUC18 to obtain both forward and reverse reads. The PGRS genes
were grossly underrepresented in pUC18 but better covered in the BAC and
cosmid M13 libraries. We used small-insert libraries44 to sequence regions
prone to compression or deletion and, in some cases, obtained sequences from
products of the polymerase chain reaction or directly from BACs7. All shotgun
sequencing was performed with standard dye terminators to minimize compression problems, whereas finishing reactions used dRhodamine or BigDye
terminators (http://www.sanger.ac.uk). Problem areas were verified by using
dye primers. Thirty differences were found between the genomic shotgun
sequences and the cosmids; twenty of which were due to sequencing errors and
ten to mutations in cosmids (1 error per 320 kb). Less than 0.1% of the
sequence was from areas of single-clone coverage, and ,0.2% was from one
strand with only one sequencing chemistry.
Informatics. Sequence assembly involved PHRAP, GAP4 (ref. 45) and a
customized perl script that merges sequences from different libraries and
generates segments that can be processed by several finishers simultaneously.
Sequence analysis and annotation was managed by DIANA (B.G.B. et al.,
unpublished). Genes encoding proteins were identified by TB-parse46 using a
hidden Markov model trained on known M. tuberculosis coding and noncoding regions and translation-initiation signals, with corroboration by positional base preference. Interrogation of the EMBL, TREMBL, SwissProt,
PROSITE47 and in-house databases involved BLASTN, BLASTX48, DOTTER
(http://www.sanger.ac.uk) and FASTA49. tRNA genes were located and identified using tRNAscan and tRNAscan-SE50. The complete sequence, a list of
annotated cosmids and linking regions can be found on our website (http://
www. sanger.ac.uk) and in MycDB (http://www.pasteur.fr/mycdb/).
Received 15 April; accepted 8 May 1998.
1. Snider, D. E. Jr, Raviglione, M. & Kochi, A. in Tuberculosis: Pathogenesis, Protection, and Control (ed.
Bloom, B. R.) 211 (Am. Soc. Microbiol., Washington DC, 1994).
2. Wheeler, P. R. & Ratledge, C. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.)
353385 (Am. Soc. Microbiol., Washington DC, 1994).
3. Chan, J. & Kaufmann, S. H. E. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.)
271284 (Am. Soc. Microbiol., Washington DC, 1994).
4. Brennan, P. J. & Draper, P. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.)
271284 (Am. Soc. Microbiol., Washington DC, 1994).
5. Kolattukudy, P. E., Fernandes, N. D., Azad, A. K., Fitzmaurice, A. M. & Sirakova, T. D. Biochemistry

Nature Macmillan Publishers Ltd 1998

543

article
and molecular genetics of cell-wall lipid biosynthesis in mycobacteria. Mol. Microbiol. 24, 263270
(1997).
6. Sreevatsan, S. et al. Restricted structural gene polymorphism in the Mycobacterium tuberculosis
complex indicates evolutionarily recent global dissemination. Proc. Natl Acad. Sci. USA 94, 9869
9874 (1997).
7. Brosch, R. et al. Use of a Mycobacterium tuberculosis H37Rv bacterial artificial chromosome library for
genome mapping, sequencing and comparative genomics. Infect. Immun. 66, 22212229 (1998).
8. Philipp, W. J. et al. An integrated map of the genome of the tubercle bacillus, Mycobacterium
tuberculosis H37Rv, and comparison with Mycobacterium leprae. Proc. Natl Acad. Sci. USA 93, 3132
3137 (1996).
9. Blattner, F. R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 14531462
(1997).
10. Cole, S. T. & Saint-Girons, I. Bacterial genomics. FEMS Microbiol. Rev. 14, 139160 (1994).
11. Freiberg, C. et al. Molecular basis of symbiosis between Rhizobium and legumes. Nature 387, 394401
(1997).
12. Bardarov, S. et al. Conditionally replicating mycobacteriophages: a system for transposon delivery to
Mycobacterium tuberculosis. Proc. Natl Acad. Sci. USA 94, 1096110966 (1997).
13. Mahairas, G. G., Sabo, P. J., Hickey, M. J., Singh, D. C. & Stover, C. K. Molecular analysis of genetic
differences between Mycobacterium bovis BCG and virulent M. bovis. J. Bacteriol. 178, 12741282
(1996).
14. Kunst, F. et al. The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature
390, 249256 (1997).
15. Smith, D. R. et al. Multiplex sequencing of 1.5 Mb of the Mycobacterium leprae genome. Genome Res.
7, 802819 (1997).
16. Greenacre, M. Theory and Application of Correspondence Analysis (Academic, London, 1984).
17. Ratledge, C. R. in The Biology of the Mycobacteria (eds Ratledge, C. & Stanford, J.) 5394 (Academic,
San Diego, 1982).
18. Av-Gay, Y. & Davies, J. Components of eukaryotic-like protein signaling pathways in Mycobacterium
tuberculosis. Microb. Comp. Genomics 2, 6373 (1997).
19. Cole, S. T. & Telenti, A. Drug resistance in Mycobacterium tuberculosis. Eur. Resp. Rev. 8, 701S713S
(1995).
20. Riley, M. & Labedan, B. in Escherichia coli and Salmonella (ed. Neidhardt, F. C.) 21182202 (ASM,
Washington, 1996).
21. Mdluli, K. et al. Inhibition of a Mycobacterium tuberculosis b-ketoacyl ACP synthase by isoniazid.
Science 280, 16071610 (1998).
22. Banerjee, A. et al. inhA, a gene encoding a target for isoniazid and ethionamide in Mycobacterium
tuberculosis. Science 263, 227230 (1994).
23. Hopwood, D. A. Genetic contributions to understanding polyketide synthases. Chem. Rev. 97, 2465
2497 (1997).
24. Minnikin, D. E. in The Biology of the Mycobacteria (eds Ratledge, C. & Stanford, J.) 95184 (Academic,
London, 1982).
25. Barry, C. E. III et al. Mycolic acids: structure, biosynthesis, and phsyiological functions. Prog. Lipid
Res. (in the press).
26. Belisle, J. T. et al. Role of the major antigen of Mycobacterium tuberculosis in cell wall biogenesis.
Science 276, 14201422 (1997).
27. Marahiel, M. A., Stachelhaus, T. & Mootz, H. D. Modular peptide synthetases involved in
nonribosomal peptide synthesis. Chem. Rev. 97, 26512673 (1997).
28. Gobin, J. et al. Iron acquisition by Mycobacterium tuberculosis: isolation and characterization of a
family of iron-binding exochelins. Proc. Natl Acad. Sci. USA 92, 51895193 (1995).
29. Young, D. B. & Fruth, U. in New Generation Vaccines (eds Levine, M., Woodrow, G., Kaper, J. & Cobon,
G. S.) 631645 (Marcel Dekker, New York, 1997).
30. Sorensen, A. L., Nagai, S., Houen, G., Andersen, P. & Anderson, A. B. Purification and characterization
of a low-molecular-mass T-cell antigen secreted by Mycobacterium tuberculosis. Infect. Immun. 63,

544

17101717 (1995).
31. Hermans, P. W. M., van Soolingen, D. & van Embden, J. D. A. Characterization of a major
polymorphic tandem repeat in Mycobacterium tuberculosis and its potential use in the epidemiology
of Mycobacterium kansasii and Mycobacterium gordonae. J. Bacteriol. 174, 41574165 (1992).
32. Poulet, S. & Cole, S. T. Characterisation of the polymorphic GC-rich repetitive sequence (PGRS)
present in Mycobacterium tuberculosis. Arch. Microbiol. 163, 8795 (1995).
33. Cole, S. T. & Barrell, B. G. in Genetics and Tuberculosis (eds Chadwick, D. J. & Cardew, G., Novartis
Foundation Symp. 217) 160172 (Wiley, Chichester, 1998).
34. Vega-Lopez, F. et al. Sequence and immunological characterization of a serine-rich antigen from
Mycobacterium leprae. Infect. Immun. 61, 21452153 (1993).
35. Abou-Zeid, C. et al. Genetic and immunological analysis of Mycobacterium tuberculosis fibronectinbinding proteins. Infect. Immun. 59, 27122718 (1991).
36. Robertson, B. D. & Meyer, T. F. Genetic variation in pathogenic bacteria. Trends Genet. 8, 422427
(1992).
37. Levitskaya, J. et al. Inhibition of antigen processing by the internal repeat region of the Epstein-Barr
virus nuclear antigen-1. Nature 375, 685688 (1995).
38. Levitskaya, J., Sharipo, A., Leonchiks, A., Ciechanover, A. & Masucci, M. G. Inhibition of ubiquitin/
proteasome-dependent protein degradation by the Gly-Ala repeat domain of the Epstein-Barr virus
nuclear antigen 1. Proc. Natl Acad. Sci. USA 94, 1261612621 (1997).
39. Flynn, J. L., Goldstein, M. A., Treibold, K. J., Koller, B. & Bloom, B. R. Major histocompatability
complex class-I restricted T cells are required for resistance to Mycobacterium tuberculosis infection.
Proc. Natl Acad. Sci. USA 89, 1201312017 (1992).
40. Bloom, B. R. & Fine, P. E. M. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.)
531557 (Am. Soc. Microbiol., Washington DC, 1994).
41. Collins, D. M. In search of tuberculosis virulence genes. Trends Microbiol. 4, 426430 (1996).
42. Arruda, S., Bomfim, G., Knights, R., Huima-Byron, T. & Riley, L. W. Cloning of an M. tuberculosis
DNA fragment associated with entry and survival inside cells. Science 261, 14541457 (1993).
43. Baumler, A. J., Kusters, J. G., Stojikovic, I. & Heffron, F. Salmonella typhimurium loci involved in
survival within macrophages. Infect. Immun. 62, 16231630 (1994).
44. McMurray, A. A., Sulston, J. E. & Quail, M. A. Short-insert libraries as a method of problem solving in
genome sequencing. Genome Res. 8, 562566 (1998).
45. Bonfield, J. K., Smith, K. F. & Staden, R. A new DNA sequence assembly program. Nucleic Acids Res.
24, 49924999 (1995).
46. Krogh, A., Mian, I. S. & Haussler, D. A hidden Markov model that finds genes in E. coli DNA. Nucleic
Acids Res. 22, 47684778 (1994).
47. Bairoch, A., Bucher, P. & Hofmann, K. The PROSITE database, its status in 1997. Nucleic Acids Res. 25,
217221 (1997).
48. Altschul, S., Gish, W., Miller, W., Myers, E. & Lipman, D. A basic local alignment search tool. J. Mol.
Biol. 215, 403410 (1990).
49. Pearson, W. & Lipman, D. Improved tools for biological sequence comparisons. Proc. Natl Acad. USA
85, 24442448 (1988).
50. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in
genomic DNA. Nucleic Acids Res. 25, 955964 (1997).
Acknowledgements. We thank Y. Av-Gay, F.-C. Bange, A. Danchin, B. Dujon, W. R. Jacobs Jr, L. Jones,
M. McNeil, I. Moszer, P. Rice and J. Stephenson for advice, reagents and support. This work was supported
by the Wellcome Trust. Additional funding was provided by the Association Francaise Raoul Follereau,
the World Health Organisation and the Institut Pasteur. S.V.G. received a Wellcome Trust travelling
research fellowship.
Correspondence and requests for materials should be addressed to B.G.B. (barrell@sanger.ac.uk) or S.T.C.
(stcole@pasteur.fr). The complete sequence has been deposited in EMBL/GenBank/DDBJ as MTBH37RV,
accession number AL123456.

Nature Macmillan Publishers Ltd 1998

NATURE | VOL 393 | 11 JUNE 1998

Table 1. Functional classification of Mycobacterium tuberculosis protein-coding genes


I. Small-molecule metabolism
A. Degradation
1. Carbon compounds
Rv0186
bglS
-glucosidase
Rv2202c cbhK
carbohydrate kinase
Rv0727c fucA
L-fuculose phosphate aldolase
Rv1731
gabD1 succinate-semialdehyde dehydrogenase
Rv0234c gabD2
succinate-semialdehyde dehydrogenase
Rv0501
galE1
UDP-glucose 4-epimerase
Rv0536
galE2
UDP-glucose 4-epimerase
Rv0620
galK
galactokinase
Rv0619
galT
galactose-1-phosphate uridylyltransferase C-term
Rv0618
galT'
galactose-1-phosphate uridylyltransferase N-term
Rv0993
galU
UTP-glucose-1-phosphate uridylyltransferase
Rv3696c glpK
ATP:glycerol 3-phosphotransferase
Rv3255c manA
mannose-6-phosphate isomerase
Rv3441c mrsA
phosphoglucomutase or phosphomannomutase
Rv0118c oxcA
oxalyl-CoA decarboxylase
Rv3068c pgmA
phosphoglucomutase
Rv3257c pmmA
phosphomannomutase
Rv3308
pmmB
phosphomannomutase
Rv2702
ppgK
polyphosphate glucokinase
Rv0408
pta
phosphate acetyltransferase
Rv0729
xylB
xylulose kinase
Rv1096
carbohydrate degrading enzyme
2. Amino acids and amines
Rv1905c aao
D-amino acid oxidase
Rv2531c adi
ornithine/arginine decarboxylase
Rv2780
ald
L-alanine dehydrogenase
Rv1538c ansA
L-asparaginase
Rv1001
arcA
arginine deiminase
Rv0753c mmsA
methylmalmonate semialdehyde
dehydrogenase
Rv0751c mmsB
methylmalmonate semialdehyde
oxidoreductase
Rv1187
rocA
pyrroline-5-carboxylate dehydrogenase
Rv2322c rocD1
ornithine aminotransferase
Rv2321c rocD2
ornithine aminotransferase
Rv1848
ureA
urease subunit
Rv1849
ureB
urease subunit
Rv1850
ureC
urease subunit
Rv1853
ureD
urease accessory protein
Rv1851
ureF
urease accessory protein
Rv1852
ureG
urease accessory protein
Rv2913c probable D-amino acid
aminohydrolase
Rv3551
possible glutaconate CoAtransferase
3. Fatty acids
Rv2501c accA1
Rv0973c

accA2

Rv2502c

accD1

Rv0974c

accD2

Rv3667
Rv3409c
Rv0222

acs
choD
echA1

Rv0456c

echA2

Rv0632c

echA3

Rv0673

echA4

Rv0675

echA5

Rv0905

echA6

Rv0971c

echA7

Rv1070c

echA8

Rv1071c

echA9

Rv1142c

echA10

Rv1141c

echA11

Rv1472

echA12

Rv1935c

echA13

Rv2486

echA14

Rv2679

echA15

acetyl/propionyl-CoA carboxylase,
subunit
acetyl/propionyl-CoA carboxylase,
subunit
acetyl/propionyl-CoA carboxylase,
subunit
acetyl/propionyl-CoA carboxylase,
subunit
acetyl-CoA synthase
cholesterol oxidase
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily (aka eccH)
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase

Rv2831
Rv3039c
Rv3373
Rv3374
Rv3516
Rv3550
Rv3774
Rv0859
Rv0243
Rv1074c
Rv1323
Rv3546
Rv3556c
Rv0860
Rv0468
Rv1715
Rv3141
Rv1912c
Rv1750c
Rv0270
Rv3561
Rv0214
Rv0166
Rv1206
Rv0119
Rv0551c
Rv2590
Rv0099
Rv1550
Rv1549
Rv1427c
Rv3089
Rv1058
Rv2187
Rv0852
Rv3506
Rv3513c
Rv3515c
Rv1185c
Rv2948c
Rv3826
Rv1529
Rv1521
Rv2930
Rv0275c
Rv2941
Rv2950c
Rv0404
Rv1925
Rv3801c
Rv1345
Rv0035
Rv2505c
Rv1193
Rv0131c
Rv0154c
Rv0215c
Rv0231
Rv0244c
Rv0271c
Rv0400c
Rv0672
Rv0752c
Rv0873
Rv0972c
Rv0975c
Rv1346
Rv1467c
Rv1679
Rv1934c
Rv1933c
Rv2500c
Rv2724c
Rv2789c
Rv3061c
Rv3140
Rv3139
Rv3274c
Rv3504
Rv3505
Rv3544c

superfamily
enoyl-CoA hydratase/isomerase
superfamily
echA17 enoyl-CoA hydratase/isomerase
superfamily
echA18 enoyl-CoA hydratase/isomerase
superfamily, N-term
echA18' enoyl-CoA hydratase/isomerase
superfamily, C-term
echA19 enoyl-CoA hydratase/isomerase
superfamily
echA20 enoyl-CoA hydratase/isomerase
superfamily
echA21 enoyl-CoA hydratase/isomerase
superfamily
fadA
oxidation complex, subunit
(acetyl-CoA C-acetyltransferase)
fadA2
acetyl-CoA C-acetyltransferase
fadA3
acetyl-CoA C-acetyltransferase
fadA4
acetyl-CoA C-acetyltransferase
(aka thiL)
fadA5
acetyl-CoA C-acetyltransferase
fadA6
acetyl-CoA C-acetyltransferase
fadB
oxidation complex, subunit
(multiple activities)
fadB2
3-hydroxyacyl-CoA dehydrogenase
fadB3
3-hydroxyacyl-CoA dehydrogenase
fadB4
3-hydroxyacyl-CoA dehydrogenase
fadB5
3-hydroxyacyl-CoA dehydrogenase
fadD1
acyl-CoA synthase
fadD2
acyl-CoA synthase
fadD3
acyl-CoA synthase
fadD4
acyl-CoA synthase
fadD5
acyl-CoA synthase
fadD6
acyl-CoA synthase
fadD7
acyl-CoA synthase
fadD8
acyl-CoA synthase
fadD9
acyl-CoA synthase
fadD10 acyl-CoA synthase
fadD11 acyl-CoA synthase, N-term
fadD11' acyl-CoA synthase, C-term
fadD12 acyl-CoA synthase
fadD13 acyl-CoA synthase
fadD14 acyl-CoA synthase
fadD15 acyl-CoA synthase
fadD16 acyl-CoA synthase
fadD17 acyl-CoA synthase
fadD18 acyl-CoA synthase
fadD19 acyl-CoA synthase
fadD21 acyl-CoA synthase
fadD22 acyl-CoA synthase
fadD23 acyl-CoA synthase
fadD24 acyl-CoA synthase
fadD25 acyl-CoA synthase
fadD26 acyl-CoA synthase
fadD27 acyl-CoA synthase
fadD28 acyl-CoA synthase
fadD29 acyl-CoA synthase
fadD30 acyl-CoA synthase
fadD31 acyl-CoA synthase
fadD32 acyl-CoA synthase
fadD33 acyl-CoA synthase
fadD34 acyl-CoA synthase
fadD35 acyl-CoA synthase
fadD36 acyl-CoA synthase
fadE1
acyl-CoA dehydrogenase
fadE2
acyl-CoA dehydrogenase
fadE3
acyl-CoA dehydrogenase
fadE4
acyl-CoA dehydrogenase
fadE5
acyl-CoA dehydrogenase
fadE6
acyl-CoA dehydrogenase
fadE7
acyl-CoA dehydrogenase
fadE8
acyl-CoA dehydrogenase
(aka aidB)
fadE9
acyl-CoA dehydrogenase
fadE10 acyl-CoA dehydrogenase
fadE12 acyl-CoA dehydrogenase
fadE13 acyl-CoA dehydrogenase
fadE14 acyl-CoA dehydrogenase
fadE15 acyl-CoA dehydrogenase
fadE16 acyl-CoA dehydrogenase
fadE17 acyl-CoA dehydrogenase
fadE18 acyl-CoA dehydrogenase
fadE19 acyl-CoA dehydrogenase
(aka mmgC)
fadE20 acyl-CoA dehydrogenase
fadE21 acyl-CoA dehydrogenase
fadE22 acyl-CoA dehydrogenase
fadE23 acyl-CoA dehydrogenase
fadE24 acyl-CoA dehydrogenase
fadE25 acyl-CoA dehydrogenase
fadE26 acyl-CoA dehydrogenase
fadE27 acyl-CoA dehydrogenase
fadE28 acyl-CoA dehydrogenase

echA16

Nature Macmillan Publishers Ltd 1998

Rv3543c
Rv3560c
Rv3562
Rv3563
Rv3564
Rv3573c
Rv3797
Rv3761c
Rv1175c
Rv0855
Rv1143
Rv1492

fadE29
fadE30
fadE31
fadE32
fadE33
fadE34
fadE35
fadE36
fadH
far
mcr
mutA

Rv1493

mutB

Rv2504c

scoA

Rv2503c

scoB

Rv1136
Rv1683

acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
2,4-Dienoyl-CoA Reductase
fatty acyl-CoA racemase
-methyl acyl-CoA racemase
methylmalonyl-CoA mutase,
subunit
methylmalonyl-CoA mutase,
subunit
3-oxo acid:CoA transferase, subunit
3-oxo acid:CoA transferase, subunit
probable carnitine racemase
possible acyl-CoA synthase

4. Phosphorous compounds
Rv2368c phoH
ATP-binding pho regulon
component
Rv1095
phoH2 PhoH-like protein
Rv3628
ppa
probable inorganic pyrophosphatase
Rv2984
ppk
polyphosphate kinase

B. Energy metabolism
1. Glycolysis
Rv1023
eno
enolase
Rv0363c fba
fructose bisphosphate aldolase
Rv1436
gap
glyceraldehyde 3-phosphate dehydrogenase
Rv0489
gpm
phosphoglycerate mutase I
Rv3010c pfkA
phosphofructokinase I
Rv2029c pfkB
phosphofructokinase II
Rv0946c pgi
glucose-6-phosphate isomerase
Rv1437
pgk
phosphoglycerate kinase
Rv1617
pykA
pyruvate kinase
Rv1438
tpi
triosephosphate isomerase
Rv2419c putative phosphoglycerate mutase
Rv3837c putative phosphoglycerate mutase
2. Pyruvate dehydrogenase
Rv2241
aceE
pyruvate dehydrogenase E1 component
Rv3303c lpdA
dihydrolipoamide dehydrogenase
Rv2497c pdhA
pyruvate dehydrogenase E1 component subunit
Rv2496c pdhB
pyruvate dehydrogenase E1 component subunit
Rv2495c pdhC
dihydrolipoamide acetyltransferase
Rv0462
probable dihydrolipoamide dehydrogenase
3. TCA cycle
Rv1475c acn
Rv0889c citA
Rv2498c citE
Rv1098c fum
Rv1131
gltA1
Rv0896
gltA2
Rv3339c icd1
Rv0066c icd2
Rv0794c lpdB
Rv1240
mdh
Rv2967c pca
Rv3318
sdhA
Rv3319
sdhB
Rv3316
sdhC
Rv3317

sdhD

Rv1248c
Rv2215

sucA
sucB

Rv0951
Rv0952

sucC
sucD

4. Glyoxylate bypass
Rv0467
aceA
Rv1915
aceAa
Rv1916
aceAb
Rv1837c glcB
Rv3323c gphA

aconitate hydratase
citrate synthase 2
citrate lyase chain
fumarase
citrate synthase 3
citrate synthase 1
isocitrate dehydrogenase
isocitrate dehydrogenase
dihydrolipoamide dehydrogenase
malate dehydrogenase
pyruvate carboxylase
succinate dehydrogenase A
succinate dehydrogenase B
succinate dehydrogenase C subunit
succinate dehydrogenase D subunit
2-oxoglutarate dehydrogenase
dihydrolipoamide succinyltransferase
succinyl-CoA synthase chain
succinyl-CoA synthase chain
isocitrate lyase
isocitrate lyase, module
isocitrate lyase, module
malate synthase
phosphoglycolate phosphatase

5. Pentose phosphate pathway


Rv1445c devB
glucose-6-phosphate 1-dehydrogenase
Rv1844c gnd
6-phosphogluconate dehydrogenase (Gram )
Rv1122
gnd2
6-phosphogluconate dehydrogenase (Gram +)
Rv1446c opcA
unknown function, may aid
G6PDH

Rv2436
Rv1408
Rv2465c
Rv1448c
Rv1449c
Rv1121

rbsK
rpe
rpi
tal
tkt
zwf

Rv1447c

zwf2

6. Respiration
a. aerobic
Rv0527
ccsA
Rv0529

ccsB

Rv1451

ctaB

Rv2200c
Rv3043c

ctaC
ctaD

Rv2193

ctaE

Rv1542c
Rv2470
Rv2249c

glbN
glbO
glpD1

Rv3302c

glpD2

Rv0694

lldD1

Rv1872c
Rv1854c
Rv3145
Rv3146
Rv3147
Rv3148
Rv3149
Rv3150
Rv3151
Rv3152
Rv3153
Rv3154
Rv3155
Rv3156
Rv3157
Rv3158
Rv2195

lldD2
ndh
nuoA
nuoB
nuoC
nuoD
nuoE
nuoF
nuoG
nuoH
nuoI
nuoJ
nuoK
nuoL
nuoM
nuoN
qcrA

Rv2196

qcrB

Rv2194

qcrC

b. anaerobic
Rv2392
cysH
Rv2899c
Rv2900c

fdhD
fdhF

Rv1552

frdA

Rv1553

frdB

Rv1554

frdC

Rv1555

frdD

Rv1161
Rv1162
Rv1164
Rv1163
Rv1736c
Rv2391

narG
narH
narI
narJ
narX
nirA

Rv0252
Rv0253

nirB
nirD

ribokinase
ribulose-phosphate 3-epimerase
phosphopentose isomerase
transaldolase
transketolase
glucose-6-phosphate 1-dehydrogenase
glucose-6-phosphate 1-dehydrogenase

cytochrome c-type biogenesis


protein
cytochrome c-type biogenesis
protein
cytochrome c oxidase assembly
factor
cytochrome c oxidase chain II
cytochrome c oxidase polypeptide I
cytochrome c oxidase polypeptide III
hemoglobin-like, oxygen carrier
hemoglobin-like, oxygen carrier
glycerol-3-phosphate dehydrogenase
glycerol-3-phosphate dehydrogenase
L-lactate dehydrogenase
(cytochrome)
L-lactate dehydrogenase
probable NADH dehydrogenase
NADH dehydrogenase chain A
NADH dehydrogenase chain B
NADH dehydrogenase chain C
NADH dehydrogenase chain D
NADH dehydrogenase chain E
NADH dehydrogenase chain F
NADH dehydrogenase chain G
NADH dehydrogenase chain H
NADH dehydrogenase chain I
NADH dehydrogenase chain J
NADH dehydrogenase chain K
NADH dehydrogenase chain L
NADH dehydrogenase chain M
NADH dehydrogenase chain N
Rieske iron-sulphur component of
ubiQ-cytB reductase
cytochrome component of ubiQcytB reductase
cytochrome b/c component of
ubiQ-cytB reductase

3'-phosphoadenylylsulfate (PAPS)
reductase
affects formate dehydrogenase-N
molybdopterin-containing oxidoreductase
fumarate reductase flavoprotein
subunit
fumarate reductase iron sulphur
protein
fumarate reductase 15kD anchor
protein
fumarate reductase 13kD anchor
protein
nitrate reductase subunit
nitrate reductase chain
nitrate reductase chain
nitrate reductase chain
fused nitrate reductase
probable nitrite reductase/sulphite
reductase
nitrite reductase flavoprotein
probable nitrite reductase small
subunit

c. Electron transport
Rv0409
ackA
acetate kinase
Rv1623c appC
cytochrome bd-II oxidase
subunit I
Rv1622c cydB
cytochrome d ubiquinol oxidase
subunit II
Rv1620c cydC
ABC transporter
Rv1621c cydD
ABC transporter
Rv2007c fdxA
ferredoxin
Rv3554
fdxB
ferredoxin
Rv1177
fdxC
ferredoxin 4Fe-4S
Rv3503c fdxD
probable ferredoxin
Rv3029c fixA
electron transfer flavoprotein
subunit
Rv3028c fixB
electron transfer flavoprotein
subunit
Rv3106
fprA
adrenodoxin and NADPH ferredoxin reductase
Rv0886
fprB
ferredoxin, ferredoxin-NADP
reductase
Rv3251c rubA
rubredoxin A

Rv3250c

rubB

rubredoxin B

7. Miscellaneous oxidoreductases and oxygenases 171


8. ATP-proton motive force
Rv1308
atpA
ATP synthase
Rv1304
atpB
ATP synthase
Rv1311
atpC
ATP synthase
Rv1310
atpD
ATP synthase
Rv1305
atpE
ATP synthase
Rv1306
atpF
ATP synthase
Rv1309
atpG
ATP synthase
Rv1307
atpH
ATP synthase

chain
chain
e chain
chain
c chain
b chain
chain
chain

C. Central intermediary metabolism


1. General
Rv2589
gabT
4-aminobutyrate aminotransferase
Rv3432c gadB
glutamate decarboxylase
Rv1832
gcvB
glycine decarboxylase
Rv1826
gcvH
glycine cleavage system H protein
Rv2211c gcvT
T protein of glycine cleavage
system
Rv1213
glgC
glucose-1-phosphate adenylyltransferase
Rv3842c glpQ1
glycerophosphoryl diester phosphodiesterase
Rv0317c glpQ2
glycerophosphoryl diester phosphodiesterase
Rv3566c nhoA
N-hydroxyarylamine o-acetyltransferase
Rv0155
pntAA
pyridine transhydrogenase subunit 1
Rv0156
pntAB
pyridine transhydrogenase subunit 2
Rv0157
pntB
pyridine transhydrogenase
subunit
Rv1127c ppdK
similar to pyruvate, phosphate
dikinase
2. Gluconeogenesis
Rv0211
pckA
phosphoenolpyruvate carboxykinase
Rv0069c sdaA
L-serine dehydratase 1
3. Sugar nucleotides
Rv1512
epiA
nucleotide sugar epimerase
Rv3784
epiB
probable UDP-galactose 4epimerase
Rv1511
gmdA
GDP-mannose 4,6 dehydratase
Rv0334
rmlA
glucose-1-phosphate thymidyltransferase
Rv3264c rmlA2
glucose-1-phosphate thymidyltransferase
Rv3464
rmlB
dTDP-glucose 4,6-dehydratase
Rv3634c rmlB2
dTDP-glucose 4,6-dehydratase
Rv3468c rmlB3
dTDP-glucose 4,6-dehydratase
Rv3465
rmlC
dTDP-4-dehydrorhamnose
3,5-epimerase
Rv3266c rmlD
dTDP-4-dehydrorhamnose
reductase
Rv0322
udgA
UDP-glucose
dehydrogenase/GDP-mannose 6dehydrogenase
Rv3265c wbbL
dTDP-rhamnosyl transferase
Rv1525
wbbl2
dTDP-rhamnosyl transferase
Rv3400
probable -phosphoglucomutase
4. Amino sugars
Rv3436c glmS

5. Sulphur
Rv0711
Rv3299c
Rv0663
Rv3077
Rv0296c
Rv3796
Rv1285
Rv1286
Rv2131c
Rv3248c
Rv3283
Rv2291
Rv3118
Rv0814c
Rv3762c

glucosamine-fructose-6phosphate aminotransferase

metabolism
atsA
arylsulfatase
atsB
proable arylsulfatase
atsD
proable arylsulfatase
atsF
proable arylsulfatase
atsG
proable arylsulfatase
atsH
proable arylsulfatase
cysD
ATP:sulphurylase subunit 2
cysN
ATP:sulphurylase subunit 1
cysQ
homologue of M.leprae cysQ
sahH
adenosylhomocysteinase
sseA
thiosulfate sulfurtransferase
sseB
thiosulfate sulfurtransferase
sseC
thiosulfate sulfurtransferase
sseC2
thiosulfate sulfurtransferase
probable alkyl sulfatase

D. Amino acid biosynthesis


1. Glutamate family
Rv1654
argB
acetylglutamate kinase
Rv1652
argC
N-acetyl--glutamyl-phosphate
reductase
Rv1655
argD
acetylornithine aminotransferase
Rv1656
argF
ornithine carbamoyltransferase
Rv1658
argG
arginosuccinate synthase
Rv1659
argH
arginosuccinate lyase
Rv1653
argJ
glutamate N-acetyltransferase
Rv2220
glnA1
glutamine synthase class I
Rv2222c glnA2
glutamine synthase class II

Nature Macmillan Publishers Ltd 1998

Rv1878
Rv2860c
Rv2918c
Rv2221c

glnA3
glnA4
glnD
glnE

Rv3859c

gltB

Rv3858c

gltD

Rv3704c

gshA

Rv2427c
Rv2439c
Rv0500

proA
proB
proC

2. Aspartate family
Rv3708c asd
Rv3709c
Rv2201
Rv3565
Rv0337c
Rv2753c
Rv2773c
Rv1202

ask
asnB
aspB
aspC
dapA
dapB
dapE

Rv2141c

dapE2

Rv2726c
Rv1293
Rv3341
Rv1079
Rv3340
Rv1133c

dapF
lysA
metA
metB
metC
metE

Rv2124c

metH

Rv1392
Rv0391

metK
metZ

Rv1294
Rv1296
Rv1295

thrA
thrB
thrC

3. Serine family
Rv0815c cysA2
Rv3117
cysA3
Rv2335
cysE
Rv0511
cysG
Rv2847c

cysG2

Rv2334
Rv1336
Rv1077
Rv0848
Rv1093
Rv0070c
Rv2996c

cysK
cysM
cysM2
cysM3
glyA
glyA2
serA

Rv0505c

serB

Rv3042c

serB2

Rv0884c

serC

probable glutamine synthase


proable glutamine synthase
uridylyltransferase
glutamate-ammonia-ligase
adenyltransferase
ferredoxin-dependent glutamate
synthase
small subunit of NADH-dependent
glutamate synthase
possible -glutamylcysteine synthase
-glutamyl phosphate reductase
glutamate 5-kinase
pyrroline-5-carboxylate reductase

aspartate semialdehyde dehydrogenase


aspartokinase
asparagine synthase B
aspartate aminotransferase
aspartate aminotransferase
dihydrodipicolinate synthase
dihydrodipicolinate reductase
succinyl-diaminopimelate desuccinylase
ArgE/DapE/Acy1/Cpg2/yscS
family
diaminopimelate epimerase
diaminopimelate decarboxylase
homoserine o-acetyltransferase
cystathionine -synthase
cystathionine -lyase
5-methyltetrahydropteroyltriglutamate-homocysteine methyltransferase
5-methyltetrahydrofolate-homocysteine methyltransferase
S-adenosylmethionine synthase
o-succinylhomoserine sulfhydrylase
homoserine dehydrogenase
homoserine kinase
homoserine synthase

thiosulfate sulfurtransferase
thiosulfate sulfurtransferase
serine acetyltransferase
uroporphyrin-III c-methyltransferase
multifunctional enzyme, siroheme
synthase
cysteine synthase A
cysteine synthase B
cystathionine -synthase
putative cysteine synthase
serine hydroxymethyltransferase
serine hydroxymethyltransferase
D-3-phosphoglycerate dehydrogenase
probable phosphoserine phosphatase
C-term similar to phosphoserine
phosphatase
phosphoserine aminotransferase

4. Aromatic amino acid family


Rv3227
aroA
3-phosphoshikimate
1-carboxyvinyl transferase
Rv2538c aroB
3-dehydroquinate synthase
Rv2537c aroD
3-dehydroquinate dehydratase
Rv2552c aroE
shikimate 5-dehydrogenase
Rv2540c aroF
chorismate synthase
Rv2178c aroG
DAHP synthase
Rv2539c aroK
shikimate kinase I
Rv3838c pheA
prephenate dehydratase
Rv1613
trpA
tryptophan synthase chain
Rv1612
trpB
tryptophan synthase chain
Rv1611
trpC
indole-3-glycerol phosphate
synthase
Rv2192c trpD
anthranilate phosphoribosyltransferase
Rv1609
trpE
anthranilate synthase
component I
Rv2386c trpE2
anthranilate synthase
component I
Rv3754
tyrA
prephenate dehydrogenase
5. Histidine
Rv1603
hisA

Rv1601

hisB

Rv1600

hisC

Rv3772

hisC2

Rv1599

hisD

phosphoribosylformimino-5aminoimidazole carboxamide
ribonucleotide isomerase
imidazole glycerol-phosphate
dehydratase
histidinol-phosphate aminotransferase
histidinol-phosphate aminotransferase
histidinol dehydrogenase

Rv1605

hisF

Rv2121c
Rv1602
Rv2122c

hisG
hisH
hisI

Rv1606

hisI2

Rv0114

6. Pyruvate family
Rv3423c alr

imidazole glycerol-phosphate
synthase
ATP phosphoribosyltransferase
amidotransferase
phosphoribosyl-AMP cyclohydrolase
probable phosphoribosyl-AMP 1,6
cyclohydrolase
similar to HisB

Rv3048c

nrdG

Rv3053c

nrdH

Rv3052c
Rv3247c
Rv2764c
Rv0570
Rv3752c

nrdI
tmk
thyA
nrdZ
-

subunit
ribonucleoside-diphosphate small
subunit
glutaredoxin electron transport
component of NrdEF system
NrdI/YgaO/YmaA family
thymidylate kinase
thymidylate synthase
ribonucleotide reductase, class II
probable cytidine/deoxycytidylate
deaminase

alanine racemase

7. Branched amino acid family


Rv1559
ilvA
threonine deaminase
Rv3003c ilvB
acetolactate synthase I large subunit
Rv3470c ilvB2
acetolactate synthase large subunit
Rv3001c ilvC
ketol-acid reductoisomerase
Rv0189c ilvD
dihydroxy-acid dehydratase
Rv2210c ilvE
branched-chain-amino-acid
transaminase
Rv1820
ilvG
acetolactate synthase II
Rv3002c ilvN
acetolactate synthase I small subunit
Rv3509c ilvX
probable acetohydroxyacid synthase I large subunit
Rv3710
leuA
-isopropyl malate synthase
Rv2995c leuB
3-isopropylmalate dehydrogenase
Rv2988c leuC
3-isopropylmalate dehydratase
large subunit
Rv2987c leuD
3-isopropylmalate dehydratase
small subunit

E. Polyamine synthesis
Rv2601
speE
spermidine synthase
F. Purines, pyrimidines, nucleosides and nucleotides
1. Purine ribonucleotide biosynthesis
Rv1389
gmk
putative guanylate kinase
Rv3396c guaA
GMP synthase
Rv1843c guaB1
inosine-5'-monophosphate dehydrogenase
Rv3411c guaB2
inosine-5'-monophosphate dehydrogenase
Rv3410c guaB3
inosine-5'-monophosphate dehydrogenase
Rv1017c prsA
ribose-phosphate pyrophosphokinase
Rv0357c purA
adenylosuccinate synthase
Rv0777
purB
adenylosuccinate lyase
Rv0780
purC
phosphoribosylaminoimidazolesuccinocarboxamide synthase
Rv0772
purD
phosphoribosylamine-glycine ligase
Rv3275c purE
phosphoribosylaminoimidazole
carboxylase
Rv0808
purF
amidophosphoribosyltransferaseRv0957
purH
phosphoribosylaminoimidazolecarboxamide formyltransferase
Rv3276c purK
phosphoribosylaminoimidazole
carboxylase ATPase subunit
Rv0803
purL
phosphoribosylformylglycinamidine synthase II
Rv0809
purM
5'-phosphoribosyl-5-aminoimidazole synthase
Rv0956
purN
phosphoribosylglycinamide
formyltransferase I
Rv0788
purQ
phosphoribosylformylglycinamidine synthase I
Rv0389
purT
phosphoribosylglycinamide
formyltransferase II
Rv2964
purU
formyltetrahydrofolate deformylase

4. Salvage of nucleosides and nucleotides


Rv3313c add
probable adenosine deaminase
Rv2584c apt
adenine phosphoribosyltransferases
Rv3315c cdd
probable cytidine deaminase
Rv3314c deoA
thymidine phosphorylase
Rv0478
deoC
deoxyribose-phosphate aldolase
Rv3307
deoD
probable purine nucleoside phosphorylase
Rv3624c hpt
probable hypoxanthine-guanine
phosphoribosyltransferase
Rv3393
iunH
probable inosine-uridine
preferring nucleoside hydrolase
Rv0535
pnp
phosphorylase from Pnp/MtaP
family 2
Rv3309c upp
uracil phophoribosyltransferase
5. Miscellaneous nucleoside/nucleotide reactions
Rv0733
adk
probable adenylate kinase
Rv2364c bex
GTP-binding protein of Era/ThdF
family
Rv1712
cmk
cytidylate kinase
Rv2344c dgt
probable deoxyguanosine
triphosphate hydrolase
Rv2404c lepA
GTP-binding protein LepA
Rv2727c miaA
tRNA (2)-isopentenylpyrophosphate transferase
Rv2445c ndkA
nucleoside diphosphate kinase
Rv2440c obg
Obg GTP-binding protein
Rv2583c relA
(p)ppGpp synthase I

G. Biosynthesis of cofactors, prosthetic groups and


carriers
1. Biotin
Rv1568
bioA
adenosylmethionine-8-amino-7oxononanoate aminotransferase
Rv1589
bioB
biotin synthase
Rv1570
bioD
dethiobiotin synthase
Rv1569
bioF
8-amino-7-oxononanoate
synthase
Rv0032
bioF2
C-terminal similar to B. subtilis
BioF
Rv3279c birA
biotin apo-protein ligase
Rv1442
bisC
biotin sulfoxide reductase
Rv0089
possible bioC biotin synthesis
gene
2. Folic acid
Rv2763c dfrA
Rv2447c folC
Rv3356c folD
Rv3609c
Rv3606c

folE
folK

Rv3608c
Rv1207
Rv3607c

folP
folP2
folX

Rv0013

pabA

Rv1005c
Rv0812

pabB
pabC

2. Pyrimidine ribonucleotide biosynthesis


Rv1383
carA
carbamoyl-phosphate synthase
subunit
Rv1384
carB
carbamoyl-phosphate synthase
subunit
Rv1380
pyrB
aspartate carbamoyltransferase
Rv1381
pyrC
dihydroorotase
Rv2139
pyrD
dihydroorotate dehydrogenase
Rv1385
pyrF
orotidine 5'-phosphate decarboxylase
Rv1699
pyrG
CTP synthase
Rv2883c pyrH
uridylate kinase
Rv0382c umpA
probable uridine 5'-monophosphate synthase

3. Lipoate
Rv2218
lipA
Rv2217
lipB

3. 2'-deoxyribonucleotide metabolism
Rv0321
dcd
deoxycytidine triphosphate
deaminase
Rv2697c dut
deoxyuridine triphosphatase
Rv0233
nrdB
ribonucleoside-diphosphate
reductase B2 (eukaryotic-like)
Rv3051c nrdE
ribonucleoside diphosphate
reductase chain
Rv1981c nrdF
ribonucleotide reductase small

4. Molybdopterin
Rv3109
moaA
Rv0869c

moaA2

Rv0438c

moaA3

Rv3110

moaB

Rv0984

moaB2

Rv3111

moaC

Rv0864

moaC2

Rv3324c

moaC3

Rv3112

moaD

Rv0868c

moaD2

dihydrofolate reductase
folylpolyglutamate synthase
methylenetetrahydrofolate dehydrogenase
GTP cyclohydrolase I
7,8-dihydro-6-hydroxymethylpterin
pyrophosphokinase
dihydropteroate synthase
dihydropteroate synthase
may be involved in folate biosynthesis
p-aminobenzoate synthase glutamine amidotransferase
p-aminobenzoate synthase
aminodeoxychorismate lyase

lipoate biosynthesis protein A


lipoate biosynthesis protein B

molybdenum cofactor biosynthesis, protein A


molybdenum cofactor biosynthesis, protein A
molybdenum cofactor biosynthesis, protein A
molybdenum cofactor biosynthesis, protein B
molybdenum cofactor biosynthesis, protein B
molybdenum cofactor biosynthesis, protein C
molybdenum cofactor biosynthesis, protein C
molybdenum cofactor biosynthesis, protein C
molybdopterin converting factor
subunit 1
molybdopterin converting factor

Nature Macmillan Publishers Ltd 1998

Rv3119

moaE

Rv0866

moaE2

Rv3322c

moaE3

Rv0994
Rv3116
Rv2338c
Rv1681
Rv1355c
Rv3206c

moeA
moeB
moeW
moeX
moeY
moeZ

Rv0865

mog

5. Pantothenate
Rv1092c coaA
Rv2225
panB

panC
panD

Rv3602c
Rv3601c

6. Pyridoxine
Rv2607
pdxH

7. Pyridine
Rv1594
Rv1595
Rv1596
Rv0423c

subunit 1
molybdopterin-converting factor
subunit 2
molybdopterin-converting factor
subunit 2
molybdopterin-converting factor
subunit 2
molybdopterin biosynthesis
molybdopterin biosynthesis
molybdopterin biosynthesis
weak similarity to E. coli MoaA
weak similarity to E. coli MoeB
probably involved in
molybdopterin biosynthesis
molybdopterin biosynthesis

pantothenate kinase
3-methyl-2-oxobutanoate
hydroxymethyltransferase
pantoate--alanine ligase
aspartate 1-decarboxylase

pyridoxamine 5'-phosphate
oxidase

nucleotide
nadA
quinolinate synthase
nadB
L-aspartate oxidase
nadC
nicotinate-nucleotide pyrophosphatase
thiC
thiamine synthesis, pyrimidine
moiety

8. Thiamine
Rv0422c thiD
Rv0414c thiE
Rv0417

thiG

Rv2977c

thiL

9. Riboflavin
Rv1940
ribA
Rv1415
ribA2
Rv1412
ribC
Rv2671
ribD
Rv2786c ribF
Rv1409
ribG
Rv1416
ribH
Rv3300c -

phosphomethylpyrimidine kinase
thiamine synthesis, thiazole
moiety
thiamine synthesis, thiazole
moiety
probable thiamine-monophosphate kinase

GTP cyclohydrolase II
probable GTP cyclohydrolase II
riboflavin synthase chain
probable riboflavin deaminase
riboflavin kinase
riboflavin biosynthesis
riboflavin synthase chain
probable deaminase, riboflavin
synthesis

10. Thioredoxin, glutaredoxin and mycothiol


Rv0773c ggtA
putative -glutamyl transpeptidase
Rv2394
ggtB
-glutamyltranspeptidase
precursor
Rv2855
gorA
glutathione reductase homologue
Rv0816c thiX
equivalent to M. leprae ThiX
Rv1470
trxA
thioredoxin
Rv1471
trxB
thioredoxin reductase
Rv3913
trxB2
thioredoxin reductase
Rv3914
trxC
thioredoxin
11. Menaquinone, PQQ, ubiquinone and other
terpenoids
Rv2682c dxs
1-deoxy-D-xylulose 5-phosphate
synthase
Rv0562
grcC1
heptaprenyl diphosphate
synthase II
Rv0989c grcC2
heptaprenyl diphosphate
synthase II
Rv3398c idsA
geranylgeranyl pyrophosphate
synthase
Rv2173
idsA2
geranylgeranyl pyrophosphate
synthase
Rv3383c idsB
transfergeranyl, similar geranyl
pyrophosphate synthase
Rv0534c menA
4-dihydroxy-2-naphthoate
octaprenyltransferase
Rv0548c menB
naphthoate synthase
Rv0553
menC
o-succinylbenzoate-CoA synthase
Rv0555
menD
2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-carboxylate synthase
Rv0542c menE
o-succinylbenzoic acid-CoA ligase
Rv3853
menG
S-adenosylmethionine:
2-demethylmenaquinone
Rv3397c phyA
phytoene synthase
Rv0693
pqqE
coenzyme PQQ synthesis
protein E
Rv0558
ubiE
ubiquinone/menaquinone biosynthesis methyltransferase
12. Heme
Rv0509
Rv0512
Rv0510
Rv2678c

and porphyrin
hemA
glutamyl-tRNA reductase
hemB
-aminolevulinic acid dehydratase
hemC
porphobilinogen deaminase
hemE
uroporphyrinogen decarboxylase

Rv1300
Rv0524

hemK
hemL

Rv2388c

hemN

Rv2677c
Rv1485

hemY'
hemZ

13. Cobalamin
Rv2849c cobA
Rv2848c cobB
Rv2231c
Rv2236c
Rv2064
Rv2065
Rv2066
Rv2070c
Rv2072c
Rv2071c
Rv2062c
Rv2208

cobC
cobD
cobG
cobH
cobI
cobK
cobL
cobM
cobN
cobS

Rv2207

cobT

Rv0254c
Rv0255c
Rv3713
Rv0306

cobU
cobQ
cobQ2
-

14. Iron utilization


Rv1876
bfrA
Rv3841
bfrB
Rv3215
entC
Rv3214
entD
Rv2895c

viuB

Rv3525c

protoporphyrinogen oxidase
glutamate-1-semialdehyde aminotransferase
oxygen-independent coproporphyrinogen III oxidase
protoporphyrinogen oxidase
ferrochelatase

cob(I)alamin adenosyltransferase
cobyrinic acid a,c-diamide
synthase
aminotransferase
cobinamide synthase
percorrin reductase
precorrin isomerase
CobI-CobJ fusion protein
precorrin reductase
probable methyltransferase
precorrin-3 methylase
cobalt insertion
cobalamin (5'-phosphate)
synthase
nicotinate-nucleotide-dimethylbenzimidazole transferase
cobinamide kinase
cobyric acid synthase
possible cobyric acid synthase
similar to BluB cobalamin synthesis protein R. capsulatus

bacterioferritin
bacterioferritin
probable isochorismate synthase
weak similarity to many phosphoglycerate mutases
similar to proteins involved in
vibriobactin uptake
similar to ferripyochelin binding
protein

H. Lipid biosynthesis
1. Synthesis of fatty and mycolic acids
Rv3285
accA3
acetyl/propionyl CoA carboxylase
subunit
Rv0904c accD3
acetyl/propionyl CoA carboxylase
subunit
Rv3799c accD4
acetyl/propionyl CoA carboxylase
subunit
Rv3280
accD5
acetyl/propionyl CoA carboxylase
subunit
Rv2247
accD6
acetyl/propionyl CoA carboxylase
subunit
Rv2244
acpM
acyl carrier protein (meromycolate
extension)
Rv2523c acpS
CoA:apo-[ACP] pantethienephosphotransferase
Rv2243
fabD
malonyl CoA-[ACP] transacylase
Rv0649
fabD2
malonyl CoA-[ACP] transacylase
Rv1483
fabG1
3-oxoacyl-[ACP] reductase (aka
MabA)
Rv1350
fabG2
3-oxoacyl-[ACP] Reductase
Rv2002
fabG3
3-oxoacyl-[ACP] reductase
Rv0242c fabG4
3-oxoacyl-[ACP] reductase
Rv2766c fabG5
3-oxoacyl-[ACP] reductase
Rv0533c fabH
-ketoacyl-ACP synthase III
Rv2524c fas
fatty acid synthase
Rv1484
inhA
enoyl-[ACP] reductase
Rv2245
kasA
-ketoacyl-ACP synthase
(meromycolate extension)
Rv2246
kasB
-ketoacyl-ACP synthase
(meromycolate extension)
Rv1618
tesB1
thioesterase II
Rv2605c tesB2
thioesterase II
Rv0033
possible acyl carrier protein
Rv1344
possible acyl carrier protein
Rv1722
possible biotin carboxylase
Rv3221c resembles biotin carboxyl carrier
Rv3472
possible acyl carrier protein
2. Modification of fatty and mycolic acids
Rv3391
acrA1
fatty acyl-CoA reductase
Rv3392c cmaA1 cyclopropane mycolic acid
synthase 1
Rv0503c cmaA2 cyclopropane mycolic acid synthase 2
Rv0824c desA1
acyl-[ACP] desaturase
Rv1094
desA2
acyl-[ACP] desaturase
Rv3229c desA3
acyl-[ACP] desaturase
Rv0645c mmaA1 methoxymycolic acid synthase 1
Rv0644c mmaA2 methoxymycolic acid synthase 2
Rv0643c mmaA3 methoxymycolic acid synthase 3
Rv0642c mmaA4 methoxymycolic acid synthase 4
Rv0447c ufaA1
unknown fatty acid methyltransferase
Rv3538
ufaA2
unknown fatty acid methyltransferase
Rv0469
umaA1 unknown mycolic acid methyltransferase

Rv0470c

umaA2

unknown mycolic acid methyltransferase

3. Acyltransferases, mycoloyltransferases and


phospholipid synthesis
Rv2289
cdh
CDP-diacylglycerol phosphatidylhydrolase
Rv2881c cdsA
phosphatidate cytidylyltransferase
Rv3804c fbpA
antigen 85A, mycolyltransferase
Rv1886c fbpB
antigen 85B, mycolyltransferase
Rv3803c fbpC1
antigen 85C, mycolyltransferase
Rv0129c fbpC2
antigen 85C', mycolytransferase
Rv0564c gpdA1
glycerol-3-phosphate dehydrogenase
Rv2982c gpdA2
glycerol-3-phosphate dehydrogenase
Rv2612c pgsA
CDP-diacylglycerol-glycerol-3phosphate phosphatidyltransferase
Rv1822
pgsA2
CDP-diacylglycerol-glycerol-3phosphate phosphatidyltransferase
Rv2746c pgsA3
CDP-diacylglycerol-glycerol-3phosphate phosphatidyltransferase
Rv1551
plsB1
glycerol-3-phosphate acyltransferase
Rv2482c plsB2
glycerol-3-phosphate acyltransferase
Rv0437c psd
putative phosphatidylserine
decarboxylase
Rv0436c pssA
CDP-diacylglycerol-serine
o-phosphatidyltransferase
Rv0045c possible dihydrolipoamide acetyltransferase
Rv0914c lipid transfer protein
Rv1543
probable fatty-acyl CoA reductase
Rv1627c lipid carrier protein
Rv1814
possible C-5 sterol desaturase
Rv1867
similar to acetyl CoA
synthase/lipid carriers
Rv2261c apolipoprotein N-acyltransferase-a
Rv2262c apolipoprotein N-acyltransferase-b
Rv3523
lipid carrier protein
Rv3720
C-term similar to cyclopropane
fatty acid synthases

I. Polyketide and non-ribosomal peptide synthesis


Rv2940c mas
mycocerosic acid synthase
Rv2384
mbtA
mycobactin/exochelin synthesis
(salicylate-AMP ligase)
Rv2383c mbtB
mycobactin/exochelin synthesis
(serine/threonine ligation)
Rv2382c mbtC
mycobactin/exochelin synthesis
Rv2381c mbtD
mycobactin/exochelin synthesis
(polyketide synthase)
Rv2380c mbtE
mycobactin/exochelin synthesis
(lysine ligation)
Rv2379c mbtF
mycobactin/exochelin synthesis
(lysine ligation)
Rv2378c mbtG
mycobactin/exochelin synthesis
(lysine hydroxylase)
Rv2377c mbtH
mycobactin/exochelin synthesis
Rv0101
nrp
unknown non-ribosomal peptide
synthase
Rv1153c omt
PKS o-methyltransferase
Rv3824c papA1
PKS-associated protein, unknown
function
Rv3820c papA2
PKS-associated protein, unknown
function
Rv1182
papA3
PKS-associated protein, unknown
function
Rv1528c papA4
PKS-associated protein, unknown
function
Rv2939
papA5
PKS-associated protein, unknown
function
Rv2946c pks1
polyketide synthase
Rv1660
pks10
polyketide synthase (chalcone
synthase-like)
Rv1665
pks11
polyketide synthase (chalcone
synthase-like)
Rv2048c pks12
polyketide synthase (erythronolide
synthase-like)
Rv3800c pks13
polyketide synthase
Rv1342c pks14
polyketide synthase (chalcone
synthase-like)
Rv2947c pks15
polyketide synthase
Rv1013
pks16
polyketide synthase
Rv1663
pks17
polyketide synthase
Rv1372
pks18
polyketide synthase
Rv3825c pks2
polyketide synthase
Rv1180
pks3
polyketide synthase
Rv1181
pks4
polyketide synthase
Rv1527c pks5
polyketide synthase
Rv0405
pks6
polyketide synthase
Rv1661
pks7
polyketide synthase
Rv1662
pks8
polyketide synthase
Rv1664
pks9
polyketide synthase

Nature Macmillan Publishers Ltd 1998

Rv2931
Rv2932
Rv2933
Rv2934
Rv2935
Rv2928
Rv1544

ppsA
ppsB
ppsC
ppsD
ppsE
tesA
-

phenolpthiocerol synthesis (pksB)


phenolpthiocerol synthesis (pksC)
phenolpthiocerol synthesis (pksD)
phenolpthiocerol synthesis (pksE)
phenolpthiocerol synthesis (pksF)
thioesterase
probable ketoacyl reductase

J. Broad regulatory functions


1. Repressors/activators
Rv1657
argR
arginine repressor
Rv1267c embR
regulator of embAB genes
(AfsR/DndI/RedD family)
Rv1909c furA
ferric uptake regulatory protein
Rv2359
furB
ferric uptake regulatory protein
Rv2919c glnB
nitrogen regulatory protein
Rv2711
ideR
iron dependent repressor, IdeR
Rv2720
lexA
LexA, SOS repressor protein
Rv1479
moxR
transcriptional regulator, MoxR
homologue
Rv3692
moxR2 transcriptional regulator, MoxR
homologue
Rv3164c moxR3 transcriptional regulator, MoxR
homologue
Rv0212c nadR
similar to E.coli NadR
Rv0117
oxyS
transcriptional regulator (LysR
family)
Rv1379
pyrR
regulatory protein pyrimidine
biosynthesis
Rv2788
sirR
iron-dependent transcriptional
repressor
Rv3082c virS
putative virulence regulating
protein (AraC/XylS family)
Rv3219
whiB1
WhiB transcriptional activator
homologue
Rv3260c whiB2
WhiB transcriptional activator
homologue
Rv3416
whiB3
WhiB transcriptional activator
homologue
Rv3681c whiB4
WhiB transcriptional activator
homologue
Rv0023
putative transcriptional regulator
Rv0043c transcriptional regulator (GntR
family)
Rv0067c transcriptional regulator
(TetR/AcrR family)
Rv0078
transcriptional regulator
(TetR/AcrR family)
Rv0081
transcriptional regulator (ArsR
family)
Rv0135c putative transcriptional regulator
Rv0144
putative transcriptional regulator
Rv0158
transcriptional regulator
(TetR/AcrR family)
Rv0165c transcriptional regulator (GntR
family)
Rv0195
transcriptional regulator
(LuxR/UhpA family)
Rv0196
transcriptional regulator
(TetR/AcrR family)
Rv0232
transcriptional regulator
(TetR/AcrR family)
Rv0238
transcriptional regulator
(TetR/AcrR family)
Rv0273c putative transcriptional regulator
Rv0302
transcriptional regulator
(TetR/AcrR family)
Rv0324
putative transcriptional regulator
Rv0328
transcriptional regulator
(TetR/AcrR family)
Rv0348
putative transcriptional regulator
Rv0377
transcriptional regulator (LysR
family)
Rv0386
transcriptional regulator
(LuxR/UhpA family)
Rv0452
putative transcriptional regulator
Rv0465c transcriptional regulator
(PbsX/Xre family)
Rv0472c transcriptional regulator
(TetR/AcrR family)
Rv0474
transcriptional regulator
(PbsX/Xre family)
Rv0485
transcriptional regulator (ROK
family)
Rv0494
transcriptional regulator (GntR
family)
Rv0552
putative transcriptional regulator
Rv0576
putative transcriptional regulator
Rv0586
transcriptional regulator (GntR
family)
Rv0650
transcriptional regulator (ROK
family)
Rv0653c putative transcriptional regulator
Rv0681
transcriptional regulator
(TetR/AcrR family)
Rv0691c transcriptional regulator
(TetR/AcrR family)
Rv0737
putative transcriptional regulator
Rv0744c putative transcriptional regulator
Rv0792c transcriptional regulator (GntR

Nature Macmillan Publishers Ltd 1998

2,100,001

1,950,001

1,800,001

1,650,001

1,500,001

1,350,001

1,200,001

1,050,001

900,001

750,001

600,001

450,001

300,001

150,001

aA

sM

43
09

urF

42
09

cy

bU

pra

etB
m

44 45
09 09

12
12

PE

15
12

pB

10 pC
tr
16

B
trp

55 856
1
18

od

27
17

28 729
17
1

h
nd

26
17

A
61 h
18 ad

63 64 65
18 18 18

32 33 34 35
17 17 17 17

B
C
od
od
m
m

D
od
m

1
bD
ga

30
17

bc

rX

na

trp

E
trp

47
13

74
14

aA

ch

73
14

uT
le

t
lg

18 19 20
12 12 12

RS
G
_P
PE

cD
su

14

fa

dE

oY

ph

bP

17
12

12
A B hA
trx trx ec

fa

86
10

50
09

cC
su

T
ho

RS
G
_P
PE

15

dE
fa

ctp

4 3
s1 34
pk 1

21
05

20
05

ga

3
rK
na

35
01

41
01

54
09

22
08

rp

oC

66
18

2
rK
na

ac

48
13

sig

38
17

87 PE
03 P

pk

42
01

nA

49
13

A
htr

67
18

39
17

py

kA

55
09

23
08

78
14

68
18

40 41 42
17 17 17

1
sB
te

77
14

44
01

ro

dA

26
12

19
16

69
18

nE
pk

fa

dE

70 71
D2
18 18 lld

bH
fa

A
en
m

dE

fa

47
17

p
ap

hA
in

73 74 75
18 18 18

dB
cy

56
13

bG

97
10

86
14

A
bfr

34
12

qY

25 uV
le
16

77
18

49
17

48
17

27

90
14

79
18

51
17

52
17

91
14

lA

A2

c
ac

lp

E
PP

sA

rp

54
17

31
16

utB
m

62 63
13 13

RS
G
_P
PE

'
tB
ly

plc

66
13

67
13

RS
G
_P
PE

45
08

89 90
18 18

35
16

99 00
14 15

rF
lp

cA

su

94
18

91 92 93
18 18 18

RS )
G 22
_P ag
PE (w

34
16

98
14

10
58
61
IS
17

56 57
17 17

88
18

rL
na

90 691
0
06

43
08

82
02

pta

hE

ad

95
18

8
s1
pk

50
12

d2

gn

60
17

lp

98
18

pD

61 62
17 17

rA

uv

oB

J
lip

10
61
IS

cin

sX

ly

09
15

77
13

10
15

78
13

htp

03 04
19 19

aa

yrR

55
12

dA

gp

91
02

51
00

dA

gm

yrB

56
12

29
11

06
19

07
19

69
17

yrC

57
12

45
16

tG
ka

70
17

58
12

A1
glt

grc

C2

65
05

73
17

72
17

47
16

15
15

24
04

61 62
12 12

etE
m

62
08

PE

2
iB
am

yrF

14
19

13
19

76
17

75
17

eS

ph

Aa

e
ac

77
17

eT

ph

sig

83
01

ctp

nrd

71
05

Ab

e
ac

78
17

fa

2
dD

79
17

nH
pk

E
PP

80
17

81
17

arg

12
pL
m
m
C

bR
em

37 38 39
11 11 11

F k 0
9
IH
dfp
m gm 13

65
12

RS
G
_P
PE

E
PP

E
PP

36
11

64
00

26
07

75
05

02
10

RS
G
_P
PE

93
13

arg

E
PP

82
17

arg

24
15

44
11

04
10

26
15

95
13

83
17

19
19

ic

74
08

20
19

pF

lp

arg

73
12

45
11

31
07

47
11

76
08

85
17

s5

22
19

86
17

0
s1
pk

pk

97 98 pH
13 13 li

D
lip

94
01

80
05
qN

lp

81 82
05 05

39
04

I
lip

24
19

E
PP

01
14

77
12

fa

31

dD

pA

26
19

E
PP

s7

pa

pk

riA

78
12

51
11

49 50
11 11

09
10

79
08

E
IK
-L
IS

08
10

E
PP

P
RE

48
11

77
08

37
07

EL

gro

38
07

84
05

41
04

11
03

fa

2
dD

05
14

27
19

28 29 30 31
19 19 19 19

95
17

tp

18

dE

fa

17

dE
faa

88
08

36
19

98
17

S
ile

rG
lp

7
s1
pk

10
14

sN

cy

37
19

pT

s9

pk

90
08

rL
lp

52
04

E
PP

88
12

rH
na

89
12

90
12

rJ rI
na na

66
16

E
PP

A
hB 39
ep 19 rib

1
s1
pk

sA

an

94
08

RS
G
_P
PE

22
14

P
RE

45
19

pG
lp

04 05
18 18

72
16

44 45 46
15 15 15

21
14

arg

qW

lp

1
aE

dn

24
14

64
04

E
PP

dD

fa

rB
th

PE

12

E
PP

54
19

fa

39
03

66
04

58 59 60
19 19 19

62
19

61
19

12
18

16
80
dE
16

fa

74
11

63
19

13
18

PE

14
18

82
16

B1
pls

X
oe

29
14

dH

2
aA
um

1
aA
um

60
15
IS

67
07

07
09

16
06

A
ald

83
16

64 965
19
1

15 16
18 18

A
frd

31
14

rfe

43
03

C
lip

79
11

PE

E
ctp

69
07

3
ce
m

17
18

67
19

RS
G
_P
PE
68
19

85 86 87
16 16 16

84
16

qJ
lp

69
19

19
18

88
16

ty

G
atp

tA
gg

rM
lp

Z
glg

i
tp

71
19

G
ilv

79
04

25
02

13
09

74
07

Y
glg

cA

76
19

72 73 74 75
19 19 19 19

se

cN
re

X
glg

bis

urA
m

3
pA
pa

48
10

PE

tP
be

79
07

cB
re

84
04

s
rr

85
04

E
PP

98
16

77
19

78
19

79
19

4
pt6
m

F
nrd

cC

f2

2
dD

f
rr

82
19

P
RE

84
19

RS
G
_P
PE

v
gc

t
tk

18
13

cA
ro

57
10

23
09

85
19

33
18

03
17

72
15

cy

cA

86
19

34
18

93
04

94
04

56
03

36
02

11
01

95
04

rA
pu

86
07

27
09

I
sig

20
13

90
11

87 88
19 19

89 90
19 19

36
18

E
PP

91
19

qI
lp

92
07

93
07

dB
lp

4
dA

3
dD

G
ctp

glc

07
17

RS
G
_P
PE

fa

fa

63 qV
10
lp

r
qo

87 88
15 15

93 94
19 19

95
19

41
18

96
19

56
14

F
ctp

42
18

13
17

57 458
1
14

27
13

93
15

dA
na

59
14

P
glg

pE
da

37
09

98
19

aB
gu

99
19

d
gn

72
10

dB
na

60
14

48
02

22 23
01 01

47
02

45 46
18 18

fa

01
bG
fa
20

47 A B
18 ure ure

20 21
17 17

03
20

C
ure

22
17

23
17

30
13

98
15

dC 597
1
na

61
14

75
10

05
12

3
dA

39
09

05
08

53
06

lJ lL
rp rp

2
pL
m
m

04
08

G
din

04
12

73
10

38
09

rL
pu

37
dD 650
0

2
pS
m
m

fa

03
12

00
20

46
02

21
01

69 70 71 72
03 03 03 03

45
02

sA
fu

rB
se

68
03

8
hA chA
e

tA
ps

01
12

ec

tC
ps

02
08

pC 801
pe
0

48
06

2
4
aA 050
cm

00
12

69
10

92
15

5
dE

7
dD

3
T
6
8
14
19
dB 71 717 71
17
1 1 1
17 pro
fa

B 90 91
bio 15 15

55
14

99
11

fa

fa

65 366 367
03
0 0

02
05

1
oS
ph

99
07

81
97 98 10
11 11
IS

B
glg

P
RE

tB
ps

98
07

64
03

fa

2
dA

c
ox

1
lE
ga

47
06

a
fb

y
ox

RS
G
_P
PE

53
14

k
09 10 11
17 17 17 cm

86
15

47
15
IS

tS
ps

E
PP

bG

97
07

G
lip

C
pro

gtE
m

fa

RS
G
_P
PE

38 39
RS
G
18 18
_P
PE

08
17

94
11

24
13

PE

RS
G
_P
PE

65 66
10 10

nD
pk

10
61
IS

95 96
07 07

4
3
2
1
aA
aA
aA
aA
m
m
m
m
m
m
m
m

82 583 584 585


15
1 1 1

cta

92
11

61
03

16
01

41
02

98 499
04
0

60
03

97
04

15
01

38 39 40
02 02 02

hA 14
gm 01

58 59
03 03

96
04

a
gc

2
2
1
oS
tC
tA
ps
ps
ph

91
07

21 22
13 13

91
11

77 78 79 80 81
15 15 15 15 15

E
PP

35
18

76
15

26
09

89 90
07 07

rQ
pu

59 060 061 062


1
10
1
1

RS
G
_P
PE

19
13

88
11

fa

14
dD

25
09

87
07

E G
rT etT 35 36 37 pT c s lK lA
th m 06 06 06 tr se nu rp rp

92
04

RS
G
_P
10
01
PE

35
02

3
gX

p
m
nra

34
06

re

08
01

D 71 73 74 75
bio 15 15 15 15

86
11

alk

02
17

F
bio

l
ta

t
og

84
07

3
nX

85
07

se

E
PP

2
bD
ga

3
33
hA 06

22
09

35
15
IS

21
09

83
07

ec

88 pm
04
g

32 rdB
02
n

I
ctp

54 55 X 56
10 10 leu 10

00 01
17 17

A
bio

zw

fa

53
10

52
10

20
09

4
dE

87
04

fa

54
15
IS

30 31
18 18

rG
py

66 67
15 15

rrl

84
11

44 vB cA
14 de op

51
10

T
arg

18 19
09 09

re

86
04

Bb trBa
rC
p
pu ptr

10
pL
m
m

65
15

97
16

06
01

29 230
02
0

aJ pR
dn hs

28
02

B
m
rp

49 50
10 10

43
14

78
07

83
04

E
grp

27
02

04
01

2
sA 23 24 25 vH 27 28 29
pg 18 18 18 gc 18 18 18

95
16

RS
G
_P
PE

14
13

rB
pu

cD
re

urB
m

aK
dn

E
PP

47
10

81
10
IS

57
15
IS

46
10

13
13

92 93 A
16 16 tly

G
ec

C 12
atp 13

s
pk

14
09

76
07

75
07

B
ctp

26
02

80 81
04 04

28
06

26 27
06 06

44 45
10 10

39
14

D
atp

43
10

rJ 91
lp 16

60 61
15 15

k
pg

24
02

02
01

47 348 349
0
03
0

25
06

22 23 24
06 06 06

E
IK
-L
IS

41 42
10 10

rS

A
ilv

p
ga

23
02

09 10 11 12
09 09 09 09

s
pk

A
atp

P2
aro

A1
ch

74 475 476 477 eoC


04
0 0 0 d

21
06

73
04

45
03

21
02

70 71 urD
07 07
p

6
pL 8
m 55
1
m

35
14

33 34
14 14

B dC D 56
frd fr frd 15

32
14

H
03 B E F
13 atp atp atp atp

xC 78
fd 11

33 34 35 36 37 38 PE
P
10 10 10 10 10 10

76
11

19
02

71 72
04 04

42
03

18
02

nrp

'
lK
14 15 17 lT lT
06 06 06 ga ga ga

2
dB

66
07

fa

41
03

W
lip

00
01

16
02

1
dD

40
03

fa

6
hA 906
ec
0

65
07

K
m 301
1
he

fa

32
10

cD

ac

eA

13
06

ac

3
dE

fa

97 98
00 00

4
dD

12
06

pC
kd

03
09

E A
m
rp prf

'
11 11
dD dD
fa
fa

28
14

o
rh

10 gtC
18 m

76 bF 678
16 ds
1

O
lip

rC
th

73
11

02
09

pB
kd

pA 00 01
om 09 09

pA
kd

98
08

10 611
0
06

65
04

13
02

E
PP

59 60 hB 762 763 764


07 07 ad
0 0
0

oR
ph

52 53 55 56 57
19 19 19 19 19

E
PP

75
16

48 49 50 51
19 19 19 19

E
PP

97
08

o
ph

70 171
11
1

25
14

rA
th

PE

pD

kd

A2
glt

rV 56
th 07

38
03

dR
na

05 06 07 08 09
06 06 06 06 06

36
15
IS

P
RE

94 095
0
00

62 463
04
0

93
00

A
ck

pC

as

ctp

59 60 61
04 04 04

36
03

03 qO
06 lp

PE

73 674
16
1

47
19

PE

23
14

sA

ly

pE

kd

E
PP

E
PP

95
08

67
11

RS
G
_P
PE

69 70 71
16 16 16

43
15

42 43 44
19 19 19

41
19

E
PP

68
16

rI
lp lbN
g

67
16

0
pA 54
1
ls

vrC

91 V
12 arg

65
11

57
04

32 33 lA
03 03 rm

10
02

58
04

91
00

09
02

98 99 00 01 rA
05 05 06 06 tc

31
03

07 08
02 02

88 89 090
0
00 00

24 25 26
10 10 10

97
05

93
08

o
en

sA
m
m

95 96
05 05

55 A2
04 ch
e

54
04

cE

hy

3
pL
m
m

29 30
03 03

92
08

qU

lp

94
05

E
PP

dE

fa

21
10

91
08

sB
m
m

27
03

cQ

hy

28
03

05
02

cP

hy

A2 bH 17 rH
19
rib
ri 14 lp
14

87
12

fd
m

din

13 14
14 14

rG
na

lp

92
05

4
pS
m
m

23
03

04
02

cD

hy

24 25 26
03 03 03

03
02

83
00

RS
G
_P
48 49 50
07 07 07
PE

91
05

4
pL
m
m

A
cit

C
rib

84 ysD
c
12

19
10

90
05

gA

ud

11
pL
m
m

79 80 81 82
00 00 00 00

20 d
03 dc

T
U
ln
glm g
2
utT
m

13
hA
ec

97
17

49
04

2
ce
m

RS
G
_P
PE

pB

op

35
15

G
rib

pC

59
11

op

34
15

e
rp

96
17

s8

pk

33
15

pD

87
08

45
07

qT rsA
lpp
p

rBB

fp

58
11

op

u
fm

57
11

48
04

86 87 588
0
05 05

43 44
07 07

pth rplY

85
08

32
15

31
15

t
fm

pA

56
11

6
s1
pk

op

ad

92 93 94
PE 17 17 17

03
14

04
14

79
12

55
11

rC

se

11 12
10 10

t
4
om 115

52
11

gA

ks

83
08

'
57
15
IS

gllp

pc

01
02

99 00
01 02

78
00

2
3
Q lyU 18
g 03

98
01

76 77
00 00

15 16
03 03

75
00

44 K 46
A1
04 sig 04
ufa

85
05

43
04

14
03

97
01

74
00

RS
PG
39 40 41 _
07 07 07 PE

E
PP

12
03

13
03

73
00

95 96
01 01

'
71 P 72
00 RE 00

80 81 82
08 08 08

10
03

08 09
03 03

gly

A2

'
L 36
k ap
ad m sig 07

E
PP PE

76
12

A3

79
05

oa

07
03

aA

sd

06
03

etS
m

c
se

rB rC
lp lp

46
11

75
08

30
07

93
01

68
00

sA psd

ps

67
00

06
10

E
PP

92
01

RS
G
_P
PE

35
04

RS
G
_P
PE

84
17

d2

91
01

bB

pa

F
R
G
arg arg arg

l2
bb
w

94
13

cr
m

fa

1
dE

lB
xy

34
04

90
01

65
00

77
05

72
12

28
07

76
05

03
10

cA

fu

68 69 rA 71
12 12 lp 12

23
15

D
ilv

C
30 31 d
33
04 04 so
04

E
PP

87 88
01 01

11
10
hA chA
ec
e

40
11

arc

etK
m

arg

99 00
09 10

cs

25
07

pB

de

74
05

2
D2 A 870
oa oa 0
m m

V
96
97 98
09 ala 09 09

64
12

34
11

J
rim

glS

63
00

28
04

p
sp

73
05

26 thA
04
x

72
05

67
08

C2
E2
63 oa og oa
08 m m m

lA

ce

84 85
01 01

60 61
00 00

RS
G
_P
98 99 00 01 02 03
02 02 03 03 03 03
PE

81
01

59
00

D
lN lX lE sN sH lF lR sE m lO
rp rp rp rp rp rp rp rp rp rp

69
05

80
01

A
oe
m

13
07

68
05

lU

ga

ats

rO

lp

aB

dn

17 518 519 520


1
15
1
1

arB

60
12

74
17

48
16

16
15

59
12

32
11

12
07

61
08

90 91 92
09 09 09

fa

dB

ats

66
05

iC
th

95
02

rT 567
0

ty

5
rA 910 pC
dB
1 lp
fa

fu

71
17

PE

14
15

82 arA
13
c

30
11

88
09

fa

dA

93
02

94
02

75 76 77 78
01 01 01 01

52 sF b sR lI 57
00 rp ss rp rp 00

20 21 iD
04 04 th

92
02

74
01

qM
lp

iA 513
ep
1

58
08

87
09

56 57
08 08

I
fC m lT nR
in rp rp ts

54
12

28
11

P
RE

RS
G
_P
PE

aD

de

dK

pp

sc

86
09

54 r
08 fa

B2
oa
m

66 67
17 17

nT

na

pd

C1

grc

lp

qL

rK
lp

nA

po

C
sJ lC lD lW lB sS lV sC lP m sQ
rp rp rp rp rp rp rp rp rp rp rp

61
05

15 16 iG
04 04 th

90
02

72
01

49
00

87 88 89
02 02 02

71
01

47 048
00
0

59 60
05 05

76
13

rE
lp

iE
th

iE
ub

83
09

26
11

'
B9
IS

08
15

75
13

25
11

fa

1
dD

E
PP

70
01

46
00

3
utT
m

PE

98 99
06 06

57
05

12
04

82
09

65
17

39
16

74
13

51
12

63 64
17 17

hC

ep

81
09

07
15

73
13

bp

03 04 05 06
15 15 15 15

71
13

zw

56
05

'
06 51
16 08
IS

49 50
08 08

97
06

D
en
m

H
gln

84
02

1
ce
m

44 045
0
00

67 168
01
0

42 043
00
0

96
06

3
sM

cy

95
06

79
RS
09
G
_P
PE

lp

qS

96 97
18 18

02
15

37
16

36
16

01
15

69 70
13 13

10
61
IS

49
12

18 19 20
11 11 11

RS
G
_P
PE

46
08

nG
pk

fa

5
dD

C
oC
en
m
bp

kA
ac

D1
lld

52
05

uS
le

83
02

65
01

63 64
01 01

qE
92
06 pq

dD

fa

07
04

81
02

61
01

39 40
00 00

38
00

12 13 14 15 16 17
11 11 11 11 11 11

76
09

42
08

L
lip

rB
uv

06
04

E
PP

37
00

B 49 50
en 05 05
m

89
06

88
06

PE

36
00

45 46 47
12 12 12

65
13

qZ

lp

87
18

32
16

13

11
11

dE

fa

94 95 96
14 14 14

bU

rs

87
06

46 47
05 05

40 841
08
0

PE

s6

pk

34

dD

fa

RS
G
_P
PE

58
01

33 34
00 00

86
06

cD
ac

39
08

09
11

qR

f
tu

dh 41 42
m 12 12

06 eB eA
11 xs xs

utA
m

F2

E 3 4
A
en 54 54 pit
m 0 0

pE 882 883 884 885 pB


1
1 1 1
fb

lp

po

fu

30

dD

fa

RS
G
_P
PE

sA

41
05

37
08

E
PP

05
11

rA

co

60
13

04
11

g
su

36
08

7
12
hA
dE
ec
fa

70
09

lp

qQ

80
18

59
13

gB

su

39 40
05 05

1
pS
m
m

77
02

81 sL sG
06 rp rp

38
05

1
pL
m
m

dD

fa

76
02

2
53
dE
01
fa

bio

A
B
tA tA ntB
pn pn
p

27 28 29 30 31
00 00 00 00 00

01 02 03
11 11 11

gA

su

ctp

27 628
16
1

A3
gln

1
dD
fa

26
16

89
14

58
13

lp

99
10

00
11

67 68
09 09

87 488
14
1

m
fu

33
12

24
16

Z
m
he

57
13

32
12

96
10

62
09

31
12

2
oH

ph

60 61
09 09

RS
G
_P
PE

79 80
06 06

78
06

37
05

dE
fa

01
04

74
02

26
00

PE

72 273
02
0

5
pS
m
m

2
lE
ga

PE

23 24 25
00 00 00

qK

lp

63 64 65 66
09 09 09 09

RS
RS
G
G
T pT eT _P
_P
PE
gluas ph PE

5
pL
m
m

pn

98
03

fa

50
01

49
01

95 96 97
03 03 03

fa

D2

48
01

21 022
0
00

uT
le

A4 74 hA5
06 ec

h
ec

30
12

nF

pk

82
14

Y
oe
m

dD

cy

81
14

rp
m

sA
de

59
09

94
03

RS
G
_P
PE

31 sT
08 ly

31
05

93
03

20
00

47
01

69
02

19
00

68
02

46
01

29 30
08 08

44 45
17 17

dC
cy

qX

lp

gly

54
13

27
12

58
09

R
80
ox
m
14

53
13

lp

92
03

30
05

qP

pp

rU
na

45
01

'
27 28 05
08 08 16
IS

26
08

d
en

sB

cc

90 etZ
m
03

aA
co

rH
pu

25
08

28
05

rT

pu

66
02

43
01

pb

pA

2
bG 51 52
fa 13 13

25
12

24
12

RS
G
_P
PE

rN
pu

1
sA

de

69
06

25 26 sA
05 05 cc

76
14

15 16
16 16

22
12

90
PE PE 10

53
09

23
05

L
m
he

86
03

62 63 64
B2
02 02 02 fec

37
01

nB
pk

38 39 40
01 01 01

12 bA
00 pa

36
01

10 11
00 00

iA

85
03

18 819
08
0

rp

oB

19
05

60
02

pp

F
ph

08
00

33

dD

85
10

16
12

44
13

84
10

rD
uv

18
05

clp

iX 17
th 08

64 65 66
06 06 06

17
05

57 58 59
02 02 02

32
01

33
01

07 T T
00 ile ala

F I2
B H A
pA
his his his im his his

24 25
17 17

his

64 465 466
14
1 1

F G D
ure ure ure

D
his

62 63
14 14

E
PP

16
05

47 48
09 09

82 83
10 10

glg

A 81
gre 10

pg

dE
fa

13 C2 A2
08 se ys
s c

ats

15
05

C
ab

B 3 4
m
1 1
he 05 05

10 11
08 08

rM
pu

b
co

2
pC

fb

30
01

rA
gy

80 381 pA 383
0 um
03
0

77 378 ec
03
0 s

co

D
nir

28
01

rB

gy

56 57 58 59 60 61 62
06 06 06 06 06 06 06

s
cy

76
03

B
nir

27
01

cF 004
re
0

2
08 09 gA 11
lP
fo
12 12 ta 12

41
09

07
08

55
06

C
m
he

74 75
03 03

26
01

aN

31 32 333 334 335 ysM 337 urI 339


hA 41
m
rp 13
13 13
1
1 1 c
1
1

dD

fa

U
lip

40
09

s
cp

54
06

A
m
he

73
03

08
05

dn

49 50 sp
02 02 h

RS
G
_P
pA
pe
PE

dn

2,250,000

2,100,000

1,950,000

1,800,000

1,650,000

1,500,000

1,350,000

1,200,000

1,050,000

900,000

750,000

600,000

450,000

300,000

150,000

4,350,001

4,200,001

4,050,001

3,900,001

3,750,001

3,600,001

3,450,001

3,300,001

3,150,001

3,000,001

2,850,001

2,700,001

2,550,001

2,400,001

2,250,001

lE
fo

PE

rA

ty

26
32

hD

ad

30
25

07
24

82
22

E
74 at6
PP 38 es

08
20

28
32

85
34

76
38

pro

36

dE

77
38

fa

60
37

F
lip

86
34

85
22

11
24

86
22

fts

11
20

09 10
20 20

90
30

91
30

79
38

hA

92 93
34 34

80
38

64
37

81
38

65
37

19 20 PE
P
36 36

qH

lp

18
36

91
34

PE

lp

82
38

26
36

96
34

96
30

83
38

PE

27
36

97
34

39
32

84
38

96 97
22 22

urF
m

22
24

98
22

urE
m

24
20

gK

24
24

ald

05 06
27 27

04
27

pc

59
28

09
27

29
36

4
ce
m

85
38

73
37

C2
his

pp

98
34

86
38

02 xD
35 fd

88
38

fa

26

dE

27

35
36

78
37

fa

dE

89 90 91 PE
P
38 38 38

77
37

2
lB
rm

rU
se

76
37

31 32 33
36 36 36

87
38

21
hA
E
ec
lip

30
36

qB

lp

61 62 63 64
33 33 33 33

prf

rA

fa

17

65
33

PE

79
37

34
15
IS

39
36

sp

oU

pE
gc

40
36

94
38

68
33

43
36

95
38

iB
ep

54
32

87
37

79
29

pA

99
38

hfl

66
25

dA

02
39

91
37

03
39

92
37

10
35

77
33

27
31

nA

51
36

78
33

69
25

84
28

31
27

08
39

09
39

bA
em

54 55 56 57 58
36 36 36 36 36

id

B
trb

lD
rm

31
31

lS

ts

sB

rp

60
36

10
39

bB
em

93
29

91
28

19

62
36

71
32

pC

dp

98
37

lM
cw

57
15
IS
B2 C
trx trx

pD

dp

19
hA 517
3
ec

55
24

his

16
39

cD

ac

pB

dp

rA

pa

pA

dp

19
35

73
32

rC
qc

50
20

lA
re

clp

rB
pa

20
35

00
30

0
gid 392

hF

fd

ap

58
24

E
PP

68
36

23
35

94
33

oe

cF

49 50 751
27 27
2

se

P2 lpP
clp c

37
23

25
35

aA

71
36

oE
nu

02
38

72
36

sA

id

pC

fb

73 nth
36

28
35

yA

ph

pA

da

PE

pA

fb

31
35

05
38

pfk

12
29

c
da

bH

co

glb

dg

80
36

E
PP

06 07
38 38

79
36

01
34

tC

ga

13
30

91
32

08
38

4
hiB
w

E
PP

02
34

t
la

'
nA

glf

po

34
35

03
34

oL

nu

G
pir

83
36

36
35

bK

plc

12
22

bM

co

76
24

plc

p
pe

bL

co

ffh

95
32

oN
nu

17
30

cs

V
84
36 pro

37
35

RS
G
_P
PE

85
36

ufa

A2

D
gln

E
PP

B
gln

23
30

81
10
IS

t
am

13 814 815 816


38
3
3
3

89
36

1 2
2
ltp 354 354

aB

gu

17
38

90
36

29

dE

fa

28

18
38

91
36

dE

fa

19
38

sm

78 79
27 27

E
PP

93
36

94
36

21
38

fa

95
36

glp

49
35

sI

gp

dA

hp

alr

8
pL
m
m

97
36

pA

pa

99
36

53
35

E
PP

iA
am

74
31

00
37

01 702
3
37

55
35

28
34

36
30

s2

pk

03
37

up

dA

fa

B
m
pm

hA

gs

57
35

23
26

05
37

E
PP

91
27

92
27

27
26

fa

dD

23

30

dE

fa

dB

ga

as

as

29
38

cB

32
38

31
38

aQ
dn

fa

33
dE

S
glm

hA
sd

98
27

sC
pp

33
38

se

12
37

oA
nh

rS

37 38
34 34

B
sp

etU
m

69
35

35
38

36
38

14 cR 16
37 re 37

Q
ob

67 68
35 35

39 440 rsA
34
3
m

27
33

E
nrd

J
sig

94
31

37 eA
38 ph

18
37

17
37

70
35

71 572
35
3

sD
pp

29
33

95
31

Q
glp

43
38

X
aZ
dn

gI
su

97
31

22
37

75
35

13
28

48
34

c
ac

48
22

10
61
IS

B2
ars

dA 47
so 38

26
37

79
35

50
34

35
33

34
33

99
31

37 38
33 33

22
dE

B
C
drr drr

fa

cy

sS

15
21

54
34

d1

81 82 83
35 35 35

etC
m

19
28

B
lig

28
37

qE
lp

54
38

G
49 50 51 s en
38 38 38 hn m

27
37

11 T
25 his

K
lip

12
25

81
10
IS

E2
trp

13
25

54 55 56 257
22 22 22
2

52 53
22 22

87
23

58
22

tA

etA 342
m
3

02
32

cs

ra

dA

55
38

29
37

87 588
35
3

56 57
38 38

86
35

ala

63
34

D
glt

C
lig

RS
G
_P
PE

30
37

pS
lp

ly

sU

A
nir

E
PP

Z
oe
m

07
32

33
37

B
glt

32
37

91
35

34
37

B2
ilv

37
37

62
38

63
38

E E
PP PP

oA
bp

72
34

C
clp

71
34

ats

ly

fa

sA

41 42
37 37

cy

nK
pk

s
pk

PE

17
32

43
37

68
38

sT

69
38

45 PE
37

44
37

21
32

S
vir

2
dD

83
30

80
34

22 H
32 sig

fa

51
37

70
38

71
38

49 50 rX 52 53
37 37 se 37 37

47 48
37 37

05 lK lX lP
36 fo fo fo

79
34

04
36

E
PP

20
32

81
30

s
dx

02
24

77
22

rD
py

40 sA 842
28 nu
2

01
24

76
22

pL
lp

81
26

bI
su

75
22

5
s1
pk

fB
in

1
hiB
w

03
36

E
PP

cy

74
22

15
hA 680
2
ec

sW

18
32

37 fA
28 rb

E
m
he

cy

sS 599 600 anD anC


3 3 p
p

tP
kg

46
33

64 865 866 867


38
3 3
3

40
37

r2
ls

10
61
IS

69
22

RS
G
_P
PE

68
22

pN 71 72 73
lp 22 22 22

33 34 35 36 37
21 21 21 21 21

32
21

Y'
m
he

sQ

F
din

cy

tD ntC 216
en
3
e

79
30

78
30

74 75
34 34

13
32

pX
lp

RS
G
_P
PE

76
30

67
22

2
sS

75 76
26 26

74
26

95
23

66
22

cy

pE gpA
u
ug

44
29

12
32

pB
ug

43
29

33
15
IS

lE
rh

75
30

60 61
38 38

35 736
37
3

RS
G
_P
PE

94
35

E
hp
m

RS
G
_P
PE

10
32

09
32

74
30

pC
ug

16
hA
ec

73
26

22 cpS
25
a

72
26

20
25

p
bc

tB
gg

65
22

29
21

sP 128
2
an

D
rib

7
pL
m
m

72 73
30 30

3
lB
rm

92 qF
35
lp

P
RE

70
26

64
22

sH 93
cy 23

PE

63
22

RS
G
_P
PE

25
21

28 29 30
28 28 28

08
32

69 70 71
30 30 30

fa

27
28
2
dD

26
28

lC 66 67
lB
rm rm 34 34

05
32

25
28

'
81
10
IS

A
m
pg

24
28

V 04
lip 32

utY
m

16 17
25 25

89 90
23 23

61 62
22 22

'
62 63 64 65 66 lpX 668 669
2 2
c
26 26 26 26 26

rE 66 67
em 30 30

23
28

64
30

as
m

20 21 22
28 28 28

etH
m
2
hE 260
2
ad

N
m
he

E
PP

14 515
25
2

G isI
20
21 his h

19
21

A
A lQ
D K M J fA
tru rp rpo rps rps rps rpm in

5
pA
pa

18
28

01
32

ic

10
25

btA
m

51
22

18
21

7
pK 1
lp 21

51 52 53 54 655 656 657 658 659 660 661


26 26 26 26
2 2 2
2 2
2
2

09
25

50
22

51 52 53
34 34 34

S
trp

50
26

16 17
28 28

08
25

btB
m

00
32

48
38

60
30

A
drr

10
61
IS

14
21

D1
glp

13
21

06 507
2
25

D6

12
21

14 15
28 28

24 25
37 37

44 45
38 38

rV 23
se 37

49
34

33
33

2
rD
uv

59
30

sE
pp

77
35

gA
na

nM
pk

3
dD

btC
m

sB

fa

ka

B 11
1
prc 2

T T
5 6
7 8 9
glycys alU264 264 64 64 264
v
2 2

oB coA
s

44 lT
26 va

sc

btD
m

57 058
3
30

12
28

04
16
IS

74
35

30
33

96
31

ars

cD

55 inP
30
d

47
34

10
28

prc

M
A
bD cp as
k
a

E
PE PP

ac

fa

41 42
26 26

'
55
08 09 15 11
28 28 IS 28

34
dE

20
37

fa

39 40 B
38 38 bfr

19
37

cA

39 40
26 26

ac

42
22

10
61
IS

05 06
21 21

I H
54
nrdnrd 30

05 06 07
28 28 28

6
sI lM 44 45
rp rp 34 34 344

47
15
IS

93
31

50
30

btE
m

9
E 9
cit 249 dE1
fa

e
ac

03 04
21 21

02
21

A
35 36 ed 638
d
26 26
2

02 03 04
28 28 28

25 26
33 33

49
30

01
28

92
31

hA
pd

RS
G
_P
PE

hB
pd

20 21 E3 hA C3
10
33 33 oa gp oa
61
IS
m
m

hB
sd

91
31

03
16
IS

00
28

32 33
26 26

G
46 47
30 30 nrd

C
dh

97
27

99
27

hC
pd

lZ
he

lV pE 39 40
va ah 22 22

37
22

btF
m

00
21

bD
co

91 92 93 94
24 24 24 24

29 630 631
2
26
2

90
31

fe

30
38

uA
le

fa

32
dE

hC hD
sd sd

31
dE

fa

pV
lp

28
26

PE

32 33 A 35
22 22 ptp 22

RS
G
_P
PE

76 tH tG
23 mb mb

88 89
31 31

34 35
34 34

28
38

37
15
IS

fa

dD

d
cd

10
61
IS

oA

de

33
34

27
38

ad

10
61
IS

cta

82 83 84 85 186 187
3 3
31 31 31 31

2
rB
se

sB
pp

B 94
95
tru 27
27

25 626
26
2

97
20

75
23

bC
co

96
20

hrc

30
22

RS
G
_P
PE

J2
72
23 dna

29
22

93 94 95
20 20 20

28
22

PE

02
16
IS

24
26

89
24

06 07
37 37

59
35

31
34

52
15
IS

12
33

80 81
31 31

11
33

79
31

30
34

1
ltp

22
26

88
24

40
E
15
PP IS

10
33

pB
rn

27
22

38
17 040 041
30
3
3
hA
ec

s
pp

21

dE

fa

37
30

R
sir

26
22

lY
he

67 oH 69 370
23 ph 23
2

19 20 21
26 26 26

18
26

77 78
31 31

35
30

87
27

17
26

16
26

91
20

n
pa

90
20

5
6
x
be 236 236

24
22

pE

pe

RS
G
_P
PE

32
15
IS

oD

S
lip

de

27
34

xB

fd

E
PP

iB
am

75
31

34
30

26

dD

fa

pU sO ribF
lp rp

33
30

2
iA
am

nJ
pk

14
W hA
arg ec

23
22

RS
G
_P
PE

Q
lip

sA 29
te 29

32
30

24
34

98
36

20
hA 551 552
3
3
ec

04
33

72 73
31 31

31
30

rS
th

84
24

60 61 362
23 23
2

A2
gln

56
15
IS

84 85 86 87
20 20 20 20

58 rB
23 fu

83
20

c 6
7
rn 292 292

lp

30
30

70
31

fp

pR

pe

11 sA 13
26 pg 26

83
24

E
gln

gly

82
20

I 1 2
rim 342 342

D2
glp

gc

48
35

22
38

47
35

ES

gro
5

dA

00 Y1
33 ho
p

EL

gro

3
hiB
w

67
31

fix

23
29

81
27

10
26

69
31

B
fix

ald

B2
pls

E
PP

A1
gln

81
20

pJ

lp

09
26

68
31

26 027
30
3

2
pA

pa

R2
ox
m

45
35

ats

5
6
R3 16 16
3
3
ox
m

25
30

77
27

xH

pd

81
24

10
61
IS

10
61
IIS

E
PP

19
22

79
20

54 55
23 23

78 479 480
24
2 2

D 15
13
34 sig 34

qC

lp

12
34

i
ne

24
30

Y
fts

78
20

16 pB pA
li
li
22

76
27

75
27

06
26

77
24

E
PP

76 77
20 20

c
su

71 72 pB 74
27 27 da 27

61 162 163
31
3
3

86 87 88
36 36 36

PE

60
31

aB

gu

r
lh

75
20

03 604 B2
2 tes
26

02
26

74
20

plc

hD
ep

73
20

19 PE PE PE
P P
30

PE

E
PP

oD

ch

E
PP

17
29

E
PP

06 07 08
34 34 34

94
32

lp

qA

5 67
bG 27

95 96 97 98 99 00 peE
25 25 25 25 25 26
s

fa

oM
nu

15
30

co

sig

74 75
24 24

72 73
24 24

65
27

04 05
34 34

ald

15
29

35
35

A
lig

92
32

nI
pk

gc

vT

bla

46 47 48
23 23 23

E
ilv

67
20

45
23

71
24

09
22

bI

co

vB vA vC
ru
ru ru

RS
G
_P
PE

68 469
2
24

aG

dn

oH uoI uoJ uoK


n n
nu
n

tA

13
29

ga

F W 88 89
sig rsb 32 32

77 78
36 36

30
35

75 76
36 36

29
35

00
34

oG
nu

tB

ga

cA

ac

08
30

99
33

oF

nu

07
30

dD

fa

pe

pD

pQ 42
lp 23

05
22

bG

co

S
06 obT ob
c
22
c

63
20

M
S'
57 58 59 60 61 62 A yA
54
27 hsd hsd 27 27 27 27 27 27 dfr th

bT

ga

nT

as

04
22

03
22

bN

co

i
64 rp 466
24
2

hK

cb

D M 08 sP 10
trm rim 29 rp 29

pZ

lp

lp

88
25

81 82 seA 284
32 32
s 3

oD
nu

05
30

26 527
35
3

gu

4,411,529

32

dD

fa

69 hE
36 ep

24
35

95
33

cD

ac

oA oB oC
nu nu nu

B
ilv

04
30

pW

52
27

cD

se

9
pL
m
m

n
as

61
20

0
06
59 2
20

V P
gly lip

tig roU
p

2 2 G 2
sR sN m mB
rp rp rp rp

54
20

01 hB
pB lS
29 rn
le rp

85
25

59
24

78 irA
32 b

77
32

43
31

53
20

97 S3 199 taC
21 mp
2
c
m

52
20

36
23

fts

21 22 pA H
39 39 rn rpm

22
35

ac

3
s1
pk

21
35

nH
iu

rE urK
pu
p

E
ys

rB

qc

N
C
ilv ilv

47
27

42
31

B4

fa

pY

lp

aA

cm

25

dE

fa

K
ys

rA
qc

51
20

98 hD
28
fd

3
E2

fa

98
29

97
28

rA

ac

4
E2

fa

97
29

96
28

iB

pp

56
24

33
23

cta

49
20

43
ag 45 A3
27 d_ 27 gs
k
p
35

81
25

42
27

qD

lp

18
35

pfl

D
trp

ez
m

RS
G
_P
PE

B
viu

89
33

72
32

37
31

se

rA

B
lin

rC

xe

40
27

91
21

31
23

54
24

pP

lp

RS
G
_P
PE

35

dE

fa

ctp

93
28

uB
le

E
PP

94
29

E
PP

dD
fa

78
25

38 39
27 27

M 12
sig 39

ats

61
36

60
15
IS

51
24

1
rK
na

90
21

50 452 453
24
2 2

77
25

E
PP

PE

89
21

86 87
33 33

68 69
32 32

33 34
31 31

84 85
33 33

RS
G
_P
PE

sB

67
32

32
31

U U ltS
glu gln g

91
29

49
24

76
25

cA

re

88
21

2
s1
pk

27
23

73 74 75
25 25 25

va

26
23

15

dD

fa

35 cX
27 re

iC
am

34
27

90
29

87
28

89
29

bb
w

10
61
IS

lC

pS

as

fo

80 81 tB
ly
33 33

1
dD
fa
RS RS
G G
_P _P
PE PE

RS
G
_P
PE

30
31

2
lA
rm

29
31

86
28

39
15
IS

33
27

71
25

uC
le

85
28

32
27

63
32

79
33

28
31

kA 446
nd 2

70
25

25
23

85 86
21 21

24
23

82 83 84
21 21 21

2 1 23
cD cD 23
ro ro

81
21

pB leuD
hu

1
utT
m

rH

30
27

e
rn

cE

ro

68
25

py

62
32

pc

bC
em

PE

frr

26
31

RS
G
_P
PE

76
33

61
32

E
PP

04 05 06
39 39 39

49
36

2
58
32 whiB

59
32

pk

19
23

29
27

tA

dc

67
25

47
20

79 80
21 21

pI
lp

C
sp

aro

8
A
ia 272
m

24
31

83
29

'
18 18
hA hA iD
ec ec am

47 pA
36 cs

ilv

E
sp

A
m lU
rp rp

pF

22 23
31 31

gp

A
m
pm

21
31

lA

dd

da

ob

A
sp

77
21

58
15
IS

T
lip

3 9 0
sA
erT t5 7 8
m mp 28 28
cd

0 6
pt7 7
m 28

B2
ots

90
37

00 01
39 39

88 89
37 37

to

71
33

A
an
m

56
32

80
29

20

dE

3
E
sA eC oa 20
cy ss m 31

78
29

RS
G
_P
PE

45
36

2
aE

86
37

dn

53
32

B
oe
m

iL
th

38
15
IS

74
28

pro

nL

pk

42 cA 44
20 pn 20

15
23

75
21

41
20

74
21

14
23

fa

38
24

65
25

22 23
27 27

37
24

3
71 72 pt8
28 28 m

21
27

gln

sK

rb

13
23

sA

id

39 040
20
2

10 11 12
23 23 23

72
21

38
20

96 897 898
38
3 3

85
37

fic 642 rU 644


th 3
3

83
37

53
15
IS

RS
G
_P
PE

69
33

49 bB bA 52
32 ru ru 32
RS
G
_P
PE

15
31

g
un

70
28

81
10
IS

74 975
29
2

69
28

xA

le

61 62 63
25 25 25

35
24

B C D
oa oa oa 13 14
m m m 31 31

hH
sa

aA

cG
re

80 81 bE
37 37
rf

36 637 638
3 3
36

dD

08
31

72
29

67
28

60
25

9
etV 0
m 23

08
23

67 68 69
21 21 21

37
20

70 pM
21 lp

34 35 36
20 20 20

10
61
IS

33
20

17 18 19
27 27 27

16
27

59
25

15
27

65 66
28 28

14
27

57 58
25 25

E E 32 33
34
PP P 24 24
24

k
trA tm
m

71
29

64
28

07
31

trB
m

56
25

32
20

65 166
2
21

p
hs

06 07
23 23

64
21

30
20

pC pD
ah ah

05
23

13
27

pB

pb

pfk

63
28

N
lip

12
27

fp

ala

pro

62
28

eR
id

68 69
29 29

ap

54
25

26
24

28
20

03 04
23 23

01 02
23 23

60
33

42 43
32 32

57 58 59
33 33 33

41
32

00 501
35
3

51 52 53 55 lD
33 33 33 33 fo

54
33

cA

se

27
20

RS
G
_P
PE

sig

53
25

25
24

00
23

A4
gln

08
27

07
27

49 50 51 E
25 25 25 aro

tB 66
kd 29

57
28

sig

23
24

58
15
IS

G
htp

61
21

26
20

59 60
21 21

25
20

98 ssr 099 pB sX sE 103 104


ft 3
ft
3
30
3 sm

rU
pu

T
nic

pp

pA pB 545 546 547 548


lp 2 2 2 2

lp

70 771rgU rT
37
3 a se

68 69
37 37

J
es
m

rN
lp

t
hp

67
37

qG

66
37

94
34

fB 237 238
ke
3
3

94
30

95
30

62
29

63
29

rA
go

61
29

95
22

urX
m

18 19 20 21
24 24 24 24

hB
su

00
27

42
25

17
24

99
26

98
26

41
25

16
24

94
22

urD
m

20 21 22 23
20 20 20 20

18 19
20 20

RS
G
_P
54
PE
28

59 60
29 29

93
30

aro

15
24

92 93
22 22

fts

16 017
20
2

t
96 du
26

aro

95
26

aro

14
24

35
32

E
PP

92
30

58
29

52
28

93 94
26 26

31 dS 33 34
32 pv 32 32

ep

62
37

51
28

56 957
29
2

50
28

A
88 89
34 34 ots

30
32

55
29

bA

co

90
26

aro

36
25

urG
m

15
20

B
88 h
pO e
22 cd lp ss

urC
m

13
24

sT

yjc

07
16
IS

13 14
20 20

A B
trk trk

rp

Q
fts

12
20

Q
32 sB fp
25 nu e pep

10
24

78
38

13
dD
fa

54
29

co

sA

de

53
29

bB

09
24

89
26

ad

PE

83 M
22 lip

12 13 14 15 616
36 36 36 36
3

11
36

sA
cp

08 49
16 33
IS

88
30

52
29

sG

cy

'
61
48 15
33 IS

aro

87
30

51
29

efp

86 87 688
26 26
2

29
25

06
24

05
24

fd

xA

44 31 146 147 148 fiH


y
21 ag
2 2
2
w

B
pit

43
21

ots

55 Z W roV
37 pro pro
p

fts

ars

82 83
34 34

25
32

85
30

rr
m

pro

29

dD

fa

81
34

24
32

R
lip

49
29

ars

43 44
28 28

83
26

25
25

pA

26 27
25 25

pR

lp

le

80
22

10
61
IS

42
21

uU
le

78 79
22 22

05
20

2
40
pE
21
da

04
20

Nature Macmillan Publishers Ltd 1998

4,350,000

4,200,000

4,050,000

3,900,000

3,750,000

3,600,000

3,450,000

3,300,000

3,150,000

3,000,000

2,850,000

2,700,000

2,550,000

2,400,000

Rv0823c

Rv0827c

Rv0890c

Rv0891c
Rv0894
Rv1019

Rv1049

Rv1129c

Rv1151c
Rv1152

Rv1167c
Rv1219c
Rv1255c

Rv1332
Rv1353c

Rv1358

Rv1359
Rv1395

Rv1404

Rv1423
Rv1460
Rv1474c

Rv1534

Rv1556
Rv1674c
Rv1675c
Rv1719

Rv1773c

Rv1776c
Rv1816
Rv1846c
Rv1931c

Rv1956
Rv1963c
Rv1985c

Rv1990c
Rv1994c

Rv2017

Rv2021c
Rv2034

Rv2175c
Rv2250c
Rv2258c
Rv2282c

Rv2308
Rv2324

Rv2358

Rv2488c

Rv2506

Rv2621c
Rv2640c

Rv2642

Rv2669
Rv2745c
Rv2779c

Rv2887

Rv2912c

Rv2989

Rv3050c
Rv3055
Rv3058c
Rv3060c

Rv3066
Rv3095
Rv3124

family)
transcriptional regulator
(NifR3/Smm1 family)
transcriptional regulator (ArsR
family)
transcriptional regulator
(LuxR/UhpA family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator (MarR
family)
transcriptional regulator
(PbsX/Xre family)
putative transcriptional regulator
transcriptional regulator (GntR
family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator
(LuxR/UhpA family)
putative transcriptional regulator
transcriptional regulator
(AraC/XylS family)
transcriptional regulator (MarR
family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator
(TetR/AcrR family)
putative transcriptional regulator
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator (IclR
family)
transcriptional regulator (IclR
family)
putative transcriptional regulator
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(AraC/XylS family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator (LysR
family)
putative transcriptional regulator
transcriptional regulator (MerR
family)
putative transcriptional regulator
(PbsX/Xre family)
putative transcriptional regulator
transcriptional regulator (ArsR
family)
putative transcriptional regulator
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator (LysR
family)
putative transcriptional regulator
transcriptional regulator
(Lrp/AsnC family)
transcriptional regulator (ArsR
family)
transcriptional regulator
(LuxR/UhpA family)
transcriptional regulator
(TetR/AcrR family)
putative transcriptional regulator
transcriptional regulator (ArsR
family)
transcriptional regulator (ArsR
family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(Lrp/AsnC family)
transcriptional regulator (MarR
family)
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator (IclR
family)
putative transcriptional regulator
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator (GntR
family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(AfsR/DndI/RedD family)

Rv3160c
Rv3167c
Rv3173c

Rv3183
Rv3208

Rv3249c

Rv3291c

Rv3295

Rv3334

Rv3405c
Rv3522
Rv3557c

Rv3574

Rv3575c

Rv3583c
Rv3676

Rv3678c

Rv3736

Rv3744

Rv3830c

Rv3833

Rv3840
Rv3855

putative transcriptional regulator


putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator
(Lrp/AsnC family)
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator (MerR
family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator (LacI
family)
putative transcriptional regulator
transcriptional regulator (Crp/Fnr
family)
transcriptional regulator (LysR
family)
transcriptional regulator
(AraC/XylS family)
transcriptional regulator (ArsR
family)
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator
(AraC/XylS family)
putative transcriptional regulator
putative transcriptional regulator

2. Two component systems


Rv1028c kdpD
sensor histidine kinase
Rv1027c kdpE
two-component response
regulator
Rv3246c mtrA
two-component response
regulator
Rv3245c mtrB
sensor histidine kinase
Rv0844c narL
two-component response
regulator
Rv0757
phoP
two-component response
regulator
Rv0758
phoR
sensor histidine kinase
Rv0491
regX3
two-component response
regulator
Rv0490
senX3
sensor histidine kinase
Rv0602c tcrA
two-component response
regulator
Rv0260c two-component response
regulator
Rv0600c sensor histidine kinase
Rv0601c sensor histidine kinase
Rv0818
two-component response
regulator
Rv0845
sensor histidine kinase
Rv0902c sensor histidine kinase
Rv0903c two-component response
regulator
Rv0981
two-component response
regulator
Rv0982
sensor histidine kinase
Rv1032c sensor histidine kinase
Rv1033c two-component response
regulator
Rv1626
two-component response
regulator
Rv2027c sensor histidine kinase
Rv2884
two-component response
regulator
Rv3132c sensor histidine kinase
Rv3133c two-component response
regulator
Rv3143
putative sensory transduction
protein
Rv3220c sensor histidine kinase
Rv3764c sensor histidine kinase
Rv3765c two-component response
regulator
3. Serine-threonine protein kinases and phosphoprotein
phosphatases
Rv0015c pknA
serine-threonine protein kinase
Rv0014c pknB
serine-threonine protein kinase
Rv0931c pknD
serine-threonine protein kinase
Rv1743
pknE
serine-threonine protein kinase
Rv1746
pknF
serine-threonine protein kinase
Rv0410c pknG
serine-threonine protein kinase
Rv1266c pknH
serine-threonine protein kinase
Rv2914c pknI
serine-threonine protein kinase
Rv2088
pknJ
serine-threonine protein kinase
Rv3080c pknK
serine-threonine protein kinase
Rv2176
pknL
serine-threonine protein kinase,

Nature Macmillan Publishers Ltd 1998

Rv0018c

ppp

Rv2234

ptpA

Rv0153c

truncated
putative phosphoprotein phosphatase
low molecular weight protein-tyrosine-phosphatase
putative protein-tyrosine-phosphatase

II. Macromolecule metabolism


A. Synthesis and modification of macromolecules
1. Ribosomal protein synthesis and modification
Rv3420c rimI
ribosomal protein S18 acetyl
transferase
Rv0995
rimJ
acetylation of 30S S5 subunit
Rv0641
rplA
50S ribosomal protein L1
Rv0704
rplB
50S ribosomal protein L2
Rv0701
rplC
50S ribosomal protein L3
Rv0702
rplD
50S ribosomal protein L4
Rv0716
rplE
50S ribosomal protein L5
Rv0719
rplF
50S ribosomal protein L6
Rv0056
rplI
50S ribosomal protein L9
Rv0651
rplJ
50S ribosomal protein L10
Rv0640
rplK
50S ribosomal protein L11
Rv0652
rplL
50S ribosomal protein L7/L12
Rv3443c rplM
50S ribosomal protein L13
Rv0714
rplN
50S ribosomal protein L14
Rv0723
rplO
50S ribosomal protein L15
Rv0708
rplP
50S ribosomal protein L16
Rv3456c rplQ
50S ribosomal protein L17
Rv0720
rplR
50S ribosomal protein L18
Rv2904c rplS
50S ribosomal protein L19
Rv1643
rplT
50S ribosomal protein L20
Rv2442c rplU
50S ribosomal protein L21
Rv0706
rplV
50S ribosomal protein L22
Rv0703
rplW
50S ribosomal protein L23
Rv0715
rplX
50S ribosomal protein L24
Rv1015c rplY
50S ribosomal protein L25
Rv2441c rpmA
50S ribosomal protein L27
Rv0105c rpmB
50S ribosomal protein L28
Rv2058c rpmB2 50S ribosomal protein L28
Rv0709
rpmC
50S ribosomal protein L29
Rv0722
rpmD
50S ribosomal protein L30
Rv1298
rpmE
50S ribosomal protein L31
Rv2057c rpmG
50S ribosomal protein L33
Rv3924c rpmH
50S ribosomal protein L34
Rv1642
rpmI
50S ribosomal protein L35
Rv3461c rpmJ
50S ribosomal protein L36
Rv1630
rpsA
30S ribosomal protein S1
Rv2890c rpsB
30S ribosomal protein S2
Rv0707
rpsC
30S ribosomal protein S3
Rv3458c rpsD
30S ribosomal protein S4
Rv0721
rpsE
30S ribosomal protein S5
Rv0053
rpsF
30S ribosomal protein S6
Rv0683
rpsG
30S ribosomal protein S7
Rv0718
rpsH
30S ribosomal protein S8
Rv3442c rpsI
30S ribosomal protein S9
Rv0700
rpsJ
30S ribosomal protein S10
Rv3459c rpsK
30S ribosomal protein S11
Rv0682
rpsL
30S ribosomal protein S12
Rv3460c rpsM
30S ribosomal protein S13
Rv0717
rpsN
30S ribosomal protein S14
Rv2056c rpsN2
30S ribosomal protein S14
Rv2785c rpsO
30S ribosomal protein S15
Rv2909c rpsP
30S ribosomal protein S16
Rv0710
rpsQ
30S ribosomal protein S17
Rv0055
rpsR
30S ribosomal protein S18
Rv2055c rpsR2
30S ribosomal protein S18
Rv0705
rpsS
30S ribosomal protein S19
Rv2412
rpsT
30S ribosomal protein S20
Rv3241c member of S30AE ribosomal
protein family
2. Ribosome modification and maturation
Rv1010
ksgA
16S rRNA dimethyltransferase
Rv2838c rbfA
ribosome-binding factor A
Rv2907c rimM
16S rRNA processing protein
3. Aminoacyl tRNA synthases and their modification
Rv2555c alaS
alanyl-tRNA synthase
Rv1292
argS
arginyl-tRNA synthase
Rv2572c aspS
aspartyl-tRNA synthase
Rv3580c cysS
cysteinyl-tRNA synthase
Rv2130c cysS2
cysteinyl-tRNA synthase
Rv1406
fmt
methionyl-tRNA formyltransferase
Rv3011c gatA
glu-tRNA-gln amidotransferase,
subunit B
Rv3009c gatB
glu-tRNA-gln amidotransferase,
subunit A
Rv3012c gatC
glu-tRNA-gln amidotransferase,
subunit C
Rv2992c gltS
glutamyl-tRNA synthase
Rv2357c glyS
glycyl-tRNA synthase
Rv2580c hisS
histidyl-tRNA synthase
Rv1536
ileS
isoleucyl-tRNA synthase
Rv0041
leuS
leucyl-tRNA synthase
Rv3598c lysS
lysyl-tRNA synthase
Rv1640c lysX
C-term lysyl-tRNA synthase
Rv1007c metS
methionyl-tRNA synthase
Rv1649
pheS
phenylalanyl-tRNA synthase
subunit

Rv1650
Rv2845c
Rv3834c
Rv2614c
Rv2906c
Rv3336c
Rv1689
Rv2448c

pheT
proS
serS
thrS
trmD
trpS
tyrS
valS

4. Nucleoproteins
Rv1407
fmu
Rv3852
hns
Rv2986c hupB
Rv1388
mIHF

phenylalanyl-tRNA synthase
subunit
prolyl-tRNA synthase
seryl-tRNA synthase
threonyl-tRNA synthase
tRNA (guanine-N1)-methyltransferase
tryptophanyl tRNA synthase
tyrosyl-tRNA synthase
valyl-tRNA synthase

similar to Fmu protein


HU-histone protein
DNA-binding protein II
integration host factor

5. DNA replication, repair, recombination and restriction/modification


Rv1317c alkA
DNA-3-methyladenine glycosidase II
Rv2836c dinF
DNA-damage-inducible protein F
Rv1329c dinG
probable ATP-dependent helicase
Rv3056
dinP
DNA-damage-inducible protein
Rv1537
dinX
probable DNA-damage-inducible
protein
Rv0001
dnaA
chromosomal replication initiator
protein
Rv0058
dnaB
DNA helicase (contains intein)
Rv1547
dnaE1
DNA polymerase III, subunit
Rv3370c dnaE2
DNA polymerase III chain
Rv2343c dnaG
DNA primase
Rv0002
dnaN
DNA polymerase III, subunit
Rv3711c dnaQ
DNA polymerase III e chain
Rv3721c dnaZX
DNA polymerase III, (dnaZ) and
(dnaX)
Rv2924c fpg
formamidopyrimidine-DNA glycosylase
Rv0006
gyrA
DNA gyrase subunit A
Rv0005
gyrB
DNA gyrase subunit B
Rv2092c helY
probable helicase, Ski2 subfamily
Rv2101
helZ
probable helicase, Snf2/Rad54
family
Rv2756c hsdM
type I restriction/modification system DNA methylase
Rv2755c hsdS'
type I restriction/modification system specificity determinant
Rv3296
lhr
ATP-dependent helicase
Rv3014c ligA
DNA ligase
Rv3062
ligB
DNA ligase
Rv3731
ligC
probable DNA ligase
Rv1020
mfd
transcription-repair coupling factor
Rv2528c mrr
restriction system protein
Rv2985
mutT1
MutT homologue
Rv1160
mutT2
MutT homologue
Rv0413
mutT3
MutT homologue
Rv3589
mutY
probable DNA glycosylase
Rv3297
nei
probable endonuclease VIII
Rv3674c nth
probable endonuclease III
Rv1316c ogt
methylated-DNA-protein-cysteine
methyltransferase
Rv1629
polA
DNA polymerase I
Rv1402
priA
putative primosomal protein n'
(replication factor Y)
Rv3585
radA
probable DNA repair RadA homologue
Rv2737c recA
recombinase (contains intein)
Rv0630c recB
exodeoxyribonuclease V
Rv0631c recC
exodeoxyribonuclease V
Rv0629c recD
exodeoxyribonuclease V
Rv0003
recF
DNA replication and SOS induction
Rv2973c recG
ATP-dependent DNA helicase
Rv1696
recN
recombination and DNA repair
Rv3715c recR
RecBC-Independent process of
DNA repair
Rv2736c recX
regulatory protein for RecA
Rv2593c ruvA
Holliday junction binding protein,
DNA helicase
Rv2592c ruvB
Holliday junction binding protein
Rv2594c ruvC
Holliday junction resolvase, endodeoxyribonuclease
Rv0054
ssb
single strand binding protein
Rv1210
tagA
DNA-3-methyladenine glycosidase I
Rv3646c topA
DNA topoisomerase
Rv2976c ung
uracil-DNA glycosylase
Rv1638
uvrA
excinuclease ABC subunit A
Rv1633
uvrB
excinuclease ABC subunit B
Rv1420
uvrC
excinuclease ABC subunit C
Rv0949
uvrD
DNA-dependent ATPase I and
helicase II
Rv3198c uvrD2
putative UvrD
Rv0427c xthA
exodeoxyribonuclease III
Rv0071
group II intron maturase
Rv0861c probable DNA helicase
Rv0944
possible formamidopyrimidineDNA glycosylase
Rv1688
probable 3-methylpurine DNA
glycosylase

Rv2090

Rv2191

Rv2464c

Rv3201c

Rv3202c
Rv3263
Rv3644c

6. Protein
Rv0429c
Rv2534c
Rv2882c
Rv0684
Rv0120c
Rv1080c
Rv3462c
Rv2839c
Rv1641
Rv0009
Rv2582
Rv1299
Rv3105c
Rv2889c
Rv0685

translation
def
efp
frr
fusA
fusA2
greA
infA
infB
infC
ppiA
ppiB
prfA
prfB
tsf
tuf

partially similar to DNA polymerase I


similar to both PolC and UvrC
proteins
probable DNA glycosylase,
endonuclease VIII
probable ATP-dependent DNA
helicase
similar to UvrD proteins
probable DNA methylase
similar in N-term to DNA polymerase III

2. DNA
Rv0670
Rv1108c
Rv1107c

Rv2460c

clpP2

and modification
polypeptide deformylase
elongation factor P
ribosome recycling factor
elongation factor G
elongation factor G
transcription elongation factor G
initiation factor IF-1
initiation factor IF-2
initiation factor IF-3
peptidyl-prolyl cis-trans isomerase
peptidyl-prolyl cis-trans isomerase
peptide chain release factor 1
peptide chain release factor 2
elongation factor EF-Ts
elongation factor EF-Tu

Rv2457c

clpX

7. RNA synthesis, RNA modification and DNA


transcription
Rv1253
deaD
ATP-dependent DNA/RNA
helicase
Rv2783c gpsI
pppGpp synthase and polyribonucleotide phosphorylase
Rv2841c nusA
transcription termination factor
Rv2533c nusB
N-utilization substance protein B
Rv0639
nusG
transcription antitermination
protein
Rv3907c pcnA
polynucleotide polymerase
Rv3232c pvdS
alternative sigma factor for
siderophore production
Rv3211
rhlE
probable ATP-dependent
RNA helicase
Rv1297
rho
transcription termination
factor rho
Rv3457c rpoA
subunit of RNA polymerase
Rv0667
rpoB
subunit of RNA polymerase
Rv0668
rpoC
' subunit of RNA polymerase
Rv1364c rsbU
SigB regulation protein
Rv3287c rsbW
anti-sigma B factor
Rv2703
sigA
RNA polymerase sigma factor
(aka MysA, RpoV)
Rv2710
sigB
RNA polymerase sigma factor
(aka MysB)
Rv2069
sigC
ECF subfamily sigma subunit
Rv3414c sigD
ECF subfamily sigma subunit
Rv1221
sigE
ECF subfamily sigma subunit
Rv3286c sigF
ECF subfamily sigma subunit
Rv0182c sigG
sigma-70 factors ECF subfamily
Rv3223c sigH
ECF subfamily sigma subunit
Rv1189
sigI
ECF family sigma factor
Rv3328c sigJ
similar to SigI, ECF family
Rv0445c sigK
ECF-type sigma factor
Rv0735
sigL
sigma-70 factors ECF subfamily
Rv3911
sigM
probable sigma factor, similar to
SigE
Rv3366
spoU
probable rRNA methylase
Rv3455c truA
probable pseudouridylate synthase
Rv2793c truB
tRNA pseudouridine 55 synthase
Rv1644
tsnR
putative 23S rRNA methyltransferase
Rv3649
ATP-dependent DNA/RNA helicase
8. Polysaccharides (cytoplasmic)
Rv1326c glgB
1,4--glucan branching enzyme
Rv1328
glgP
probable glycogen phosphorylase
Rv1564c glgX
probable glycogen debranching
enzyme
Rv1563c glgY
putative -amylase
Rv1562c glgZ
maltooligosyltrehalose trehalohydrolase
Rv0126
probable glycosyl hydrolase
Rv1781c probable 4--glucanotransferase
Rv2471
probable maltase -glucosidase

B. Degradation of macromolecules
1. RNA
Rv1014c pth
peptidyl-tRNA hydrolase
Rv2925c rnc
RNAse III
Rv2444c rne
similar at C-term to ribonuclease E
Rv2902c rnhB
ribonuclease HII
Rv3923c rnpA
ribonuclease P protein component
Rv1340
rphA
ribonuclease PH

Nature Macmillan Publishers Ltd 1998

end
xseA
xseB

endonuclease IV (apurinase)
exonuclease VII large subunit
exonuclease VII small subunit

3. Proteins, peptides
Rv3305c amiA
Rv3306c amiB
Rv3596c clpC
Rv2461c clpP

Rv2667

clpX'

Rv3419c
Rv2725c
Rv1223
Rv2861c
Rv0734

gcp
hflX
htrA
map
map'

Rv0319
Rv0125
Rv2213
Rv0800
Rv2467
Rv2089c
Rv2535c
Rv2782c

pcp
pepA
pepB
pepC
pepD
pepE
pepQ
pepR

Rv2109c
Rv2110c
Rv0782
Rv0781
Rv0724

prcA
prcB
ptrBa
ptrBb
sppA

Rv0198c
Rv0457c
Rv0840c
Rv0983
Rv1977
Rv3668c
Rv3671c
Rv3883c
Rv3886c

4. Polysaccharides,
lipids
Rv0062
celA
Rv3915
cwlM
Rv0315
Rv1090
Rv1327c

Rv1333
Rv3463
Rv3717

and glycopeptides
probable aminohydrolase
probable aminohydrolase
ATP-dependent Clp protease
ATP-dependent Clp protease proteolytic subunit
ATP-dependent Clp protease proteolytic subunit
ATP-dependent Clp protease
ATP-binding subunit ClpX
similar to ClpC from M. leprae but
shorter
glycoprotease
GTP-binding protein
serine protease
methionine aminopeptidase
probable methionine aminopeptidase
pyrrolidone-carboxylate peptidase
probable serine protease
aminopeptidase A/I
aminopeptidase I
probable aminopeptidase
cytoplasmic peptidase
cytoplasmic peptidase
protease/peptidase, M16 family
(insulinase)
proteasome -type subunit 1
proteasome -type subunit 2
protease II, subunit
protease II, subunit
protease IV, signal peptide peptidase
probable zinc metalloprotease
probable peptidase
probable proline iminopeptidase
probable serine protease
probable zinc metallopeptidase
probable alkaline serine protease
probable serine protease
probable secreted protease
protease

lipopolysaccharides and phosphocellulase/endoglucanase


hydrolase
probable -1,3-glucanase
probable inactivated
cellulase/endoglucanase
probable glycosyl hydrolase, amylase family
probable hydrolase
probable neuraminidase
possible N-acetylmuramoyl-L-alanine amidase

5. Esterases and lipases


Rv0220
lipC
probable esterase
Rv1923
lipD
probable esterase
Rv3775
lipE
probable hydrolase
Rv3487c lipF
probable esterase
Rv0646c lipG
probable hydrolase
Rv1399c lipH
probable lipase
Rv1400c lipI
probable lipase
Rv1900c lipJ
probable esterase
Rv2385
lipK
probable acetyl-hydrolase
Rv1497
lipL
esterase
Rv2284
lipM
probable esterase
Rv2970c lipN
probable lipase/esterase
Rv1426c lipO
probable esterase
Rv2463
lipP
probable esterase
Rv2485c lipQ
probable carboxlyesterase
Rv3084
lipR
probable acetyl-hydrolase
Rv3176c lipS
probable esterase/lipase
Rv2045c lipT
probable carboxylesterase
Rv1076
lipU
probable esterase
Rv3203
lipV
probable lipase
Rv0217c lipW
probable esterase
Rv2351c plcA
phospholipase C precursor
Rv2350c plcB
phospholipase C precursor
Rv2349c plcC
phospholipase C precursor
Rv1755c plcD
partial CDS for phospholipase C
Rv1104
probable esterase pseudogene
Rv1105
probable esterase pseudogene
6. Aromatic hydrocarbons
Rv3469c mhpE
probable 4-hydroxy-2-oxovalerate
aldolase
Rv0316
probable muconolactone isomerase
Rv0771
probable 4-carboxymuconolactone decarboxylase
Rv0939
probable dehydrase
Rv1723
6-aminohexanoate-dimer hydro-

Rv2715

Rv3530c
Rv3534c
Rv3536c

lase
2-hydroxymuconic semialdehyde
hydrolase
probable cis-diol dehydrogenase
4-hydroxy-2-oxovalerate aldolase
aromatic hydrocarbon degradation

C. Cell envelope
1. Lipoproteins (lppA-lpr0) 65
2. Surface polysaccharides, lipopolysaccharides, proteins and antigens
Rv0806c cpsY
probable UDP-glucose-4epimerase
Rv3811
csp
secreted protein
Rv1677
dsbF
highly similar to C-term Mpt53
Rv3794
embA
involved in arabinogalactan synthesis
Rv3795
embB
involved in arabinogalactan synthesis
Rv3793
embC
involved in arabinogalactan synthesis
Rv3875
esat6
early secretory antigen target
Rv0112
gca
probable GDP-mannose dehydratase
Rv0113
gmhA
phosphoheptose isomerase
Rv2965c kdtB
lipopolysaccharide core biosynthesis protein
Rv2878c mpt53
secreted protein Mpt53
Rv1980c mpt64
secreted immunogenic protein
Mpb64/Mpt64
Rv2875
mpt70
major secreted immunogenic protein Mpt70 precursor
Rv2873
mpt83
surface lipoprotein Mpt83
Rv0899
ompA
member of OmpA family
Rv3810
pirG
cell surface protein precursor (Erp
protein)
Rv3782
rfbE
similar to rhamnosyl transferase
Rv1302
rfe
undecaprenyl-phosphate -Nacetylglucosaminyltransferase
Rv2145c wag31
antigen 84 (aka wag31)
Rv0431
tuberculin related peptide (AT103)
Rv0954
cell envelope antigen
Rv1514c involved in polysaccharide synthesis
Rv1518
involved in exopolysaccharide
synthesis
Rv1758
partial cutinase
Rv1910c probable secreted protein
Rv1919c weak similarity to pollen antigens
Rv1984c probable secreted protein
Rv1987
probable secreted protein
Rv2223c probable exported protease
Rv2224c probable exported protease
Rv2301
probable cutinase
Rv2345
precursor of probable membrane
protein
Rv2672
putative exported protease
Rv3019c similar to Esat6
Rv3036c probable secreted protein
Rv3449
probable precursor of serine protease
Rv3451
probable cutinase
Rv3452
probable cutinase precursor
Rv3724
probable cutinase precursor
3. Murein
Rv2911
Rv2981c
Rv3809c
Rv1018c
Rv3382c
Rv1110
Rv1315
Rv0482
Rv2152c
Rv2155c
Rv2158c
Rv2157c
Rv2153c
Rv1338
Rv2156c
Rv3332
Rv0016c
Rv2163c
Rv0050
Rv3682
Rv0017c
Rv0907

sacculus and peptidoglycan


dacB
penicillin binding protein
ddlA
D-alanine-D-alanine ligase A
glf
UDP-galactopyranose mutase
glmU
UDP-N-acetylglucosamine
pyrophosphorylase
lytB
LytB protein homologue
lytB'
very similar to LytB
murA
UDP-N-acetylglucosamine-1-carboxyvinyltransferase
murB
UDP-N-acetylenolpyruvoylglucosamine reductase
murC
UDP-N-acetyl-muramate-alanine
ligase
murD
UDP-N-acetylmuramoylalanine-Dglutamate ligase
murE
meso-diaminopimelate-adding
enzyme
murF
D-alanine:D-alanine-adding
enzyme
murG
transferase in peptidoglycan synthesis
murI
glutamate racemase
murX
phospho-N-acetylmuramoylpetapeptide transferase
nagA
N-acetylglucosamine-6-Pdeacetylase
pbpA
penicillin-binding protein
pbpB
penicillin-binding protein 2
ponA
penicillin-bonding protein
ponA'
class A penicillin binding protein
rodA
FtsW/RodA/SpovE family
probable penicillin binding protein

Rv1367c
Rv1730c
Rv1922
Rv2864c
Rv3330
Rv3627c

probable
probable
probable
probable
probable
probable

penicillin
penicillin
penicillin
penicillin
penicillin
penicillin

binding
binding
binding
binding
binding
binding

protein
protein
protein
protein
protein
protein

4. Conserved membrane proteins


Rv0402c mmpL1 conserved large membrane
protein
Rv0507
mmpL2 conserved large membrane
protein
Rv0206c mmpL3 conserved large membrane
protein
Rv0450c mmpL4 conserved large membrane
protein
Rv0676c mmpL5 conserved large membrane
protein
Rv1557
mmpL6 conserved large membrane
protein
Rv2942
mmpL7 conserved large membrane
protein
Rv3823c mmpL8 conserved large membrane
protein
Rv2339
mmpL9 conserved large membrane
protein
Rv1183
mmpL10 conserved large membrane
protein
Rv0202c mmpL11 conserved large membrane
protein
Rv1522c mmpL12 conserved large membrane
protein
Rv0403c mmpS1 conserved small membrane
protein
Rv0506
mmpS2 conserved small membrane
protein
Rv2198c mmpS3 conserved small membrane
protein
Rv0451c mmpS4 conserved small membrane
protein
Rv0677c mmpS5 conserved small membrane
protein
5. Other membrane proteins 211
III. Cell processes
A. Transport/binding proteins
1. Amino acids
Rv2127
ansP
L-asparagine permease
Rv0346c aroP2
probable aromatic amino acid
permease
Rv0917
betP
glycine betaine transport
Rv1704c cycA
transport of D-alanine, D-serine
and glycine
Rv3666c dppA
probable peptide transport system
permease
Rv3665c dppB
probable peptide transport system
permease
Rv3664c dppC
probable peptide transport system
permease
Rv3663c dppD
probable ABC-transporter
Rv0522
gabP
probable 4-amino butyrate transporter
Rv0411c glnH
putative glutamine binding protein
Rv2564
glnQ
probable ATP-binding transport
protein
Rv1280c oppA
probable oligopeptide transport
protein
Rv1283c oppB
oligopeptide transport protein
Rv1282c oppC
oligopeptide transport system permease
Rv1281c oppD
probable peptide transport protein
Rv2320c rocE
arginine/ornithine transporter
Rv3253c probable cationic amino acid
transport
Rv3454
possible proline permease
2. Cations
Rv2920c amt
Rv1607
chaA
Rv1239c corA
Rv0092
Rv0103c
Rv3270
Rv1469

ctpA
ctpB
ctpC
ctpD

Rv0908
Rv1997
Rv1992c
Rv0425c

ctpE
ctpF
ctpG
ctpH

Rv0107c

ctpI

Rv0969
Rv3044
Rv0265c
trate
Rv1029

ctpV
fecB
fecB2
kdpA

putative ammonium transporter


putative calcium/proton antiporter
probable magnesium and cobalt
transport protein
cation-transporting ATPase
cation transport ATPase
cation transport ATPase
probable cadmium-transporting
ATPase
probable cation transport ATPase
probable cation transport ATPase
probable cation transport ATPase
C-terminal region putative cationtransporting ATPase
probable magnesium transport
ATPase
cation transport ATPase
putative FeIII-dicitrate transporter
iron transport protein FeIII dicitransporter
potassium-transporting ATPase A
chain

Nature Macmillan Publishers Ltd 1998

Rv1030

kdpB

Rv1031

kdpC

Rv3236c

kefB

Rv2877c

merT

Rv1811

mgtC

Rv0362

mgtE

Rv2856
Rv0924c

nicT
nramp

Rv2691

trkA

Rv2692

trkB

Rv2287
Rv2723

yjcE
-

Rv3162c
Rv3237c

Rv3743c

potassium-transporting ATPase B
chain
potassium-transporting ATPase C
chain
probable glutathione-regulated
potassium-efflux protein
possible mercury resistance
transport system
probable magnesium transport
ATPase protein C
putative magnesium ion
transporter
probable nickel transport protein
transmembrane protein belonging
to Nramp family
probable potassium uptake protein
probable potassium uptake protein
probable Na+/H+ exchanger
probable membrane protein,
tellurium resistance
probable membrane protein
possible potassium channel
protein
probable cation-transporting
ATPase

3. Carbohydrates, organic acids and alcohols


Rv2443
dctA
C4-dicarboxylate transport protein
Rv3476c kgtP
sugar transport protein
Rv1902c nanT
probable sialic acid transporter
Rv1236
sugA
membrane protein probably
involved in sugar transport
Rv1237
sugB
sugar transport protein
Rv1238
sugC
ABC transporter component of
sugar uptake system
Rv3331
sugI
probable sugar transport protein
Rv2835c ugpA
sn-glycerol-3-phosphate
permease
Rv2833c ugpB
sn-glycerol-3-phosphate-binding
periplasmic lipoprotein
Rv2832c ugpC
sn-glycerol-3-phosphate transport
ATP-binding protein
Rv2834c ugpE
sn-glycerol-3-phosphate transport
system protein
Rv2316
uspA
sugar transport protein
Rv2318
uspC
sugar transport protein
Rv2317
uspE
sugar transport protein
Rv1200
probable sugar transporter
Rv2038c probable ABC sugar transporter
Rv2039c probable sugar transporter
Rv2040c probable sugar transporter
Rv2041c probable sugar transporter
4. Anions
Rv2684
Rv2685
Rv3578
Rv2643
Rv2397c

arsA
arsB
arsB2
arsC
cysA

Rv2399c

cysT

Rv2398c

cysW

Rv1857
Rv1858

modA
modB

Rv1859

modC

Rv1860

modD

Rv2329c
Rv1737c
Rv0261c
Rv0267

narK1
narK2
narK3
narU

Rv0934

phoS1

Rv0928

phoS2

Rv0820

phoT

Rv3301c

phoY1

Rv0821c

phoY2

Rv0545c

pitA

Rv2281
Rv0930

pitB
pstA1

Rv0936

pstA2

Rv0933

pstB

Rv0935

pstC

Rv0929

pstC2

probable arsenical pump


probable arsenical pump
probable arsenical pump
probable arsenical pump
sulphate transport ATP-binding
protein
sulphate transport system permease protein
sulphate transport system permease protein
molybdate binding protein
transport system permease,
molybdate uptake
molybdate uptake ABCtransporter
precursor of Apa (45/47
kD secreted protein)
probable nitrite extrusion protein
nitrite extrusion protein
nitrite extrusion protein1
similar to nitrite extrusion
protein 2
PstS component of phosphate
uptake
PstS component of phosphate
uptake
phosphate transport system ABC
transporter
phosphate transport system
regulator
phosphate transport system
regulator
low-affinity inorganic phosphate
transporter
phosphate permease
PstA component of phosphate
uptake
PstA component of phosphate
uptake
ABC transport component of
phosphate uptake
PstC component of phosphate
uptake
membrane-bound component of

Rv0932c

pstS

Rv2400c
Rv0143c
Rv1707
Rv1739c
Rv3679
Rv3680

subI
-

phosphate transport system


PstS component of phosphate
uptake
sulphate binding precursor
probable chloride channel
probable sulphate permease
possible sulphate transporter
possible anion transporter
probable anion transporter

5. Fatty acid transport


Rv2790c ltp1
non-specific lipid transport protein
Rv3540c ltp2
non-specific lipid transport protein
6. Efflux proteins
Rv2936
drrA
Rv2937

drrB

Rv2938

drrC

Rv2846c
Rv3065
Rv0783c
Rv0849
Rv1145
Rv1146
Rv1250
Rv1258c

efpA
emrE
-

Rv1410c
Rv1634
Rv1819c

Rv2136c

Rv2209
Rv2333c

Rv2994

Rv1877
Rv2459

similar daunorubicin resistance


ABC-transporter
similar daunorubicin resistance
transmembrane protein
similar daunorubicin resistance
transmembrane protein
putative efflux protein
resistance to ethidium bromide
multidrug resistance protein
possible quinolone efflux pump
probable drug transporter
probable drug transporter
probable drug efflux protein
probable multidrug resistance
pump
probable drug efflux protein
probable drug efflux protein
probable multidrug resistance
pump
putative bacitracin resistance protein
probable drug efflux protein
probable tetracenomycin C resistance protein
probable fluoroquinolone efflux
protein
probable drug efflux protein
probable drug efflux protein

B. Chaperones/Heat shock
Rv0384c clpB
heat shock protein
Rv0352
dnaJ
acts with GrpE to stimulate DnaK
ATPase
Rv2373c dnaJ2
DnaJ homologue
Rv0350
dnaK
70 kD heat shock protein, chromosome replication
Rv3417c groEL1 60 kD chaperonin 1
Rv0440
groEL2 60 kD chaperonin 2
Rv3418c groES
10 kD chaperone
Rv0351
grpE
stimulates DnaK ATPase activity
Rv2374c hrcA
heat-inducible transcription
repressor
Rv0251c hsp
possible heat shock protein
Rv0353
hspR
heat shock regulator
Rv2031c hspX
14kD antigen, heat shock protein
Hsp20 family
Rv2299c htpG
heat shock protein Hsp90 family
Rv0563
htpX
probable (transmembrane) heat
shock protein
Rv2701c suhB
putative extragenic suppressor
protein
Rv3269
probable heat shock protein
C. Cell division
Rv3641c fic
Rv3102c ftsE
Rv3610c ftsH
Rv2748c
Rv2151c
Rv2154c

ftsK
ftsQ
ftsW

Rv3101c
Rv2921c
Rv2150c
Rv3919c
Rv3625c
Rv3917c

ftsX
ftsY
ftsZ
gid
mesJ
parA

Rv3918c

parB

Rv2922c

smc

Rv0012
Rv0435c
Rv2115c
Rv3213c

Rv1708

possible cell division protein


membrane protein
inner membrane protein,
chaperone
chromosome partitioning
ingrowth of wall at septum
membrane protein (shape determination)
membrane protein
cell division protein FtsY
circumferential ring, GTPase
glucose inhibited division protein B
probable cell cycle protein
chromosome partitioning; DNA binding
possibly involved in chromosome
partitioning
member of Smc1/Cut3/Cut14
family
possible cell division protein
ATPase of AAA-family
ATPase of AAA-family
possible role in chromosome segregation
possible role in chromosome partitioning

D. Protein and peptide secretion


Rv2916c ffh
signal recognition particle protein
Rv2903c lepB
signal peptidase I
Rv1614
lgt
prolipoprotein diacylglyceryl transferase
Rv1539
lspA
lipoprotein signal peptidase
Rv0379
sec
probable transport protein
SecE/Sec61- family
Rv3240c secA
SecA, preprotein translocase sub-

Rv1821

secA2

Rv2587c
Rv0638
Rv2586c
Rv1440

secD
secE
secF
secG

Rv0732

secY

Rv2462c

tig

Rv2813

unit
SecA, preprotein translocase subunit
protein-export membrane protein
SecE preprotein translocase
protein-export membrane protein
protein-export membrane protein
SecG
SecY subunit of preprotein translocase
chaperone protein, similar to
trigger factor
probable general secretion pathway protein

E. Adaptations and atypical conditions


Rv1901
cinA
competence damage protein
Rv3648c cspA
cold shock protein, transcriptional
regulator
Rv0871
cspB
probable cold shock protein
Rv3063
cstA
starvation-induced stress
response protein
Rv3490
otsA
probable ,-trehalose-phosphate
synthase
Rv2006
otsB
trehalose-6-phosphate phosphatase
Rv3372
otsB2
trehalose-6-phosphate phosphatase
Rv3758c proV
osmoprotection ABC transporter
Rv3757c proW
transport system permease
Rv3759c proX
similar to osmoprotection proteins
Rv3756c proZ
transport system permease
Rv1026
probable pppGpp-5'phosphohydrolase
F. Detoxification
Rv2428
ahpC
Rv2429
ahpD
Rv2238c ahpE
Rv2521
bcp
Rv1608c bcpB
Rv3473c

bpoA

Rv1123c

bpoB

Rv0554

bpoC

Rv3617
Rv1938
Rv1124
Rv2214c
Rv3670
Rv0134
Rv3171c

ephA
ephB
ephC
ephD
ephE
ephF
hpx

Rv1908c
Rv3846
Rv0432

katG
sodA
sodC

Rv1932
Rv0634c
Rv2581c
Rv3177

tpx
-

IV. Other
A. Virulence
Rv0169
mce1
Rv0589
mce2
Rv1966
mce3
Rv3499c mce4
Rv3100c smpB
Rv1694
tlyA
Rv0024
Rv0167
Rv0168
Rv0170
Rv0171
Rv0172
Rv0174
Rv0587
Rv0588
Rv0590
Rv0591
Rv0592
Rv0594
Rv1085c Rv1477
Rv1478

Rv1566c

Rv1964
Rv1965
Rv1967
Rv1968
Rv1969
Rv1971
Rv2190c
Rv3494c
Rv3496c
Rv3497c
Rv3498c

alkyl hydroperoxide reductase


member of AhpC/TSA family
member of AhpC/TSA family
bacterioferritin comigratory protein
probable bacterioferritin comigratory protein
probable non-heme bromoperoxidase
probable non-heme bromoperoxidase
probable non-heme bromoperoxidase
probable epoxide hydrolase
probable epoxide hydrolase
probable epoxide hydrolase
probable epoxide hydrolase
probable epoxide hydrolase
probable epoxide hydrolase
probable non-heme haloperoxidase
catalase-peroxidase
superoxide dismutase
superoxide dismutase precursor (Cu-Zn)
thiol peroxidase
putative glyoxylase II
putative glyoxylase II
probable non-heme haloperoxidase

cell invasion protein


cell invasion protein
cell invasion protein
cell invasion protein
probable small protein b
cytotoxin/hemolysin homologue
putative p60 homologue
part of mce1 operon
part of mce1 operon
part of mce1 operon
part of mce1 operon
part of mce1 operon
part of mce1 operon
part of mce2 operon
part of mce2 operon
part of mce2 operon
part of mce2 operon
part of mce2 operon
part of mce2 operon
possible hemolysin
putative exported p60 protein
homologue
putative exported p60 protein
homologue
putative exported p60 protein
homologue
part of mce3 operon
part of mce3 operon
part of mce3 operon
part of mce3 operon
part of mce3 operon
part of mce3 operon
putative p60 homologue
part of mce4 operon
part of mce4 operon
part of mce4 operon
part of
mce4 operon
Nature
Macmillan
Publishers Ltd 1998

Rv3500c
Rv3501c
Rv3896c
Rv3922c

part of mce4 operon


part of mce4 operon
putative p60 homologue
possible hemolysin

B. IS elements, Repeated sequences, and Phage


1. IS elements
IS6110
16 copies
IS1081
6 copies
Others
37 copies
2. REP13E12 family

7 copies

3. Phage-related functions
Rv2894c xerC
integrase/recombinase
Rv1701
xerD
integrase/recombinase
Rv1054
integrase-a
Rv1055
integrase-b
Rv1573
phiRV1 phage related protein
Rv1574
phiRV1 phage related protein
Rv1575
phiRV1 phage related protein
Rv1576c phiRV1 phage related protein
Rv1577c phiRV1 possible prohead protease
Rv1578c phiRV1 phage related protein
Rv1579c phiRV1 phage related protein
Rv1580c phiRV1 phage related protein
Rv1581c phiRV1 phage related protein
Rv1582c phiRV1 phage related protein
Rv1583c phiRV1 phage related protein
Rv1584c phiRV1 phage related protein
Rv1585c phiRV1 phage related protein
Rv1586c phiRV1 integrase
Rv2309c integrase
Rv2310
excisionase
Rv2646
phiRV2 integrase
Rv2647
phiRV2 phage related protein
Rv2650c phiRV2 phage related protein
Rv2651c phiRV2 prohead protease
Rv2652c phiRV2 phage related protein
Rv2653c phiRV2 phage related protein
Rv2654c phiRV2 phage related protein
Rv2655c phiRV2 phage related protein
Rv2656c phiRV2 phage related protein
Rv2657c similar to gp36 of mycobacteriophage L5
Rv2658c phiRV2 phage related protein
Rv2659c phiRV2 integrase
Rv2830c similar to phage P1 phd gene
Rv3750c excisionase
Rv3751
putative integrase

C. PE and PPE families


1. PE family
PE subfamily
38 members
PE_PGRS subfamily 61 members
2. PPE family

68 members

D. Antibiotic production and resistance


Rv2068c blaC
class A -lactamase
Rv3290c lat
lysine-e aminotransferase
Rv2043c pncA
pyrazinamide resistance/sensitivity
Rv0133
possible puromycin N-acetyltransferase
Rv0262c aminoglycoside 2'-N-acetyltransferase
Rv0802c acetyltransferase
Rv1082
similar to S. lincolnensis lmbE
Rv1170
similar to S. lincolnensis lmbE
Rv1347c possible aminoglycoside 6'-Nacetyltransferase
Rv2036
similar to lincomycin production
genes
Rv2303c similar to S. griseus macrotetrolide
resistance protein
Rv3225c probable aminoglycoside 3'-phosphotransferases
Rv3700c probable acetyltransferase
Rv3817
probable aminoglycoside 3'-phosphotransferase
E. Bacteriocin-like proteins

F. Cytochrome P450 enzymes

22

G. Coenzyme F420-dependent
enzymes

H. Miscellaneous transferases

61

I. Miscellaneous phosphatases, lyases,


and hydrolases

18

J. Cyclases

K. Chelatases

V. Conserved hypotheticals

912

VI. Unknowns

606

TOTAL

3924

You might also like