Professional Documents
Culture Documents
errata
Nature
393, 537544 (1998)
..........................................................................................................................................................................................................................................................................
As a result of an error during lm output, Table 1 was published with some symbols missing. The correct version can be found at
http://www.sanger.ac.uk and is reproduced again here (following pages).
Also, in Fig. 2, we incorrectly labelled Rv0649 as fadD37 instead of fabD2. Two of the genes for mycolyl transferases were inverted:
Rv0129c encodes antigen 85C and not 85C9 as stated, whereas Rv3803c codes for the secreted protein MPT51 and not antigen 85C (Infect.
Immun. 59, 372382; 1991); Rv3803c is now designated fbpD. We thank Morten Harboe and Harald Wiker for drawing this to our
attention.
The sequence of Rv0746 from M. bovis BCG-Pasteur presented in Fig. 5b was incorrect and should have shown a 16-codon deletion
instead of 29, as indicated here:
H37Rv.....GSGAPGGAGGAAGLWGTGGAGGAGGSSAGGGGAGGAGGAGGWLLGDGGAGGIGGAST...
..........::::::::::::::::::::
:::::::::::::::::::::
BCG.......GSGAPGGAGGAAGLWGTGGA----------------GGAGGWLLGDGGAGGIGGAST...
190
letters to nature
191
letters to nature
192
letters to nature
193
letters to nature
194
letters to nature
195
letters to nature
196
letters to nature
197
letters to nature
198
article
Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus.
The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been
determined and analysed in order to improve our understanding of the biology of this slow-growing pathogen and to help
the conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs,
contains around 4,000 genes, and has a very high guanine + cytosine content that is reflected in the biased amino-acid
content of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its coding
capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of
glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation.
537
article
random small-insert clones from a whole-genome shotgun library.
This culminated in a composite sequence of 4,411,529 base pairs
(bp) (Figs 1, 2), with a G + C content of 65.6%. This represents the
second-largest bacterial genome sequence currently available (after
that of Escherichia coli)9. The initiation codon for the dnaA gene, a
hallmark for the origin of replication, oriC, was chosen as the start
point for numbering. The genome is rich in repetitive DNA,
particularly insertion sequences, and in new multigene families
and duplicated housekeeping genes. The G + C content is relatively
constant throughout the genome (Fig. 1) indicating that horizontally transferred pathogenicity islands of atypical base composition
are probably absent. Several regions showing higher than average G
+ C content (Fig. 1) were detected; these correspond to sequences
belonging to a large gene family that includes the polymorphic G +
C-rich sequences (PGRSs).
Genes for stable RNA. Fifty genes coding for functional RNA
molecules were found. These molecules were the three species
produced by the unique ribosomal RNA operon, the 10Sa RNA
involved in degradation of proteins encoded by abnormal messenger RNA, the RNA component of RNase P, and 45 transfer RNAs.
No 4.5S RNA could be detected. The rrn operon is situated
unusually as it occurs about 1,500 kilobases (kb) from the putative
oriC; most eubacteria have one or more rrn operons near to oriC to
exploit the gene-dosage effect obtained during replication10. This
arrangement may be related to the slow growth of M. tuberculosis.
The genes encoding tRNAs that recognize 43 of the 61 possible sense
codons were distributed throughout the genome and, with one
0
4
M. tuberculosis
H37Rv
4,411,529 bp
others are pink) and the direct repeat region (pink cube); the second ring inwards
position and orientation of known genes and coding sequences (CDS). We used
shows the coding sequence by strand (clockwise, dark green; anticlockwise, light
the following functional categories (adapted from ref. 20): lipid metabolism
green); the third ring depicts repetitive DNA (insertion sequences, orange; 13E12
REP family, dark pink; prophage, blue); the fourth ring shows the positions of the
PPE family members (green); the fifth ring shows the PE family members (purple,
excluding PGRS); and the sixth ring shows the positions of the PGRS sequences
related functions (blue); stable RNAs (purple); cell wall and cell processes (dark
yellow, and .65% G + C in red. The figure was generated with software from
DNASTAR.
http://www.sanger.ac.uk.
538
article
Genes encoding proteins. 3,924 open reading frames were identified in the genome (see Methods), accounting for ,91% of the
potential coding capacity (Figs 1, 2). A few of these genes appear to
have in-frame stop codons or frameshift mutations (irrespective of
the source of the DNA sequenced) and may either use frameshifting
during translation or correspond to pseudogenes. Consistent with
the high G + C content of the genome, GTG initiation codons (35%)
are used more frequently than in Bacillus subtilis (9%) and E. coli
(14%), although ATG (61%) is the most common translational
start. There are a few examples of atypical initiation codons, the
most notable being the ATC used by infC, which begins with ATT in
both B. subtilis and E. coli9,14. There is a slight bias in the orientation
of the genes (Fig. 1) with respect to the direction of replication as
,59% are transcribed with the same polarity as replication,
compared with 75% in B. subtilis. In other bacteria, genes transcribed in the same direction as the replication forks are believed to
be expressed more efficiently9,14. Again, the more even distribution
in gene polarity seen in M. tuberculosis may reflect the slow growth
and infrequent replication cycles. Three genes (dnaB, recA and
Rv1461) have been invaded by sequences encoding inteins (protein
introns) and in all three cases their counterparts in M. leprae also
contain inteins, but at different sites15 (S.T.C. et al., unpublished
observations).
Protein function, composition and duplication. By using various
database comparisons, we attributed precise functions to ,40% of
the predicted proteins and found some information or similarity for
another 44%. The remaining 16% resembled no known proteins
and may account for specific mycobacterial functions. Examination
of the amino-acid composition of the M. tuberculosis proteome by
correspondence analysis16, and comparison with that of other
microorganisms whose genome sequences are available, revealed a
statistically significant preference for the amino acids Ala, Gly, Pro,
Arg and Trp, which are all encoded by G + C-rich codons, and a
comparative reduction in the use of amino acids encoded by A + Trich codons such as Asn, Ile, Lys, Phe and Tyr (Fig. 3). This approach
also identified two groups of proteins rich in Asn or Gly that belong
to new families, PE and PPE (see below). The fraction of the
proteome that has arisen through gene duplication is similar to
that seen in E. coli or B. subtilis (,51%; refs 9, 14), except that the
level of sequence conservation is considerably higher, indicating
that there may be extensive redundancy or differential production
of the corresponding polypeptides. The apparent lack of divergence
following gene duplication is consistent with the hypothesis that
M. tuberculosis is of recent descent6.
General metabolism, regulation and drug resistance
F2 -22.6%
Glu
Mj
0.1
Tyr
Gly
Val
Ile
Lys
Af
Mth
Ae
Bs
Bb
Arg
Met
Mt
Cys
Asp
Hp
Phe
-0.1
Leu
Sc
Ser
Pro
Hi
Ce
His
Ec
Ss
Thr
Mg
Ala
Trp
Mp
Asn
-0.2
-0.3
Gln
-0.30
-0.15
0.15
0.30
F1 - 55.2%
539
article
Figure 4 Lipid metabolism. a, Degradation of
a
OH
SCoA
echA1-21
SCoA
-Oxidation
(fadA/fadB)
fadE1-36
Citric
acid
cycle
accA1/accD1
accA2/accD2
O
fadD1-36
Purines,
Pyrimidines
Glyoxylate
shunt
SCoA
NADH, FADH 2
SCoA
SCoA
fadA2-6
O
R
ATP
fadB2-5
Porphyrins
Amino acids
Glucose
S Co A
CO2
OH
Host membranes
H
OH
O
H
OH
fas
cma
genes
are
responsible
for
fabD
mabA
acpM
kasA
kasB
8 (kb)
accD
inhA
cmaA1
Gene
Function
fas
fabD
acpM
kasA
kasB
accD
mabA
inhA
cmaA1
cmaA2
cmaA2
c
CH 3
CH 3
O
ppsA
ppsB
ppsC
ppsD
ppsE
CH 3
CH 3
mas
CH 3
CH 3
10
20
30
OCH 3
CH 3
CH 3
CH 3
CH 3
40 (kb)
Gene
Function
mas
ppsA
ppsB
ppsC
ppsD
ppsE
540
O
O
article
Lipid metabolism
a
Representatives of the PE family
PE
~110 aa
PE
~110 aa
PGRS
(GGAGGA) n
0 .. >1,400 aa
unique sequence
0 .. ~500 aa
~180 aa
PPE
~180 aa
PPE
~180 aa
MPTR
(NxGxGNxG) n
~200 .. >3,500 aa
GxxSVPxxW
~200 .. ~400 aa
unique sequence
0 .. ~400 aa
b
PE-PGRS sequence variation in ORF Rv0746
H37Rv
PE
29 aa
PGRS
BCG
PE
PGRS
46 aa
H37Rv .. ...GSGAPGGAGGAAGLWGTGGAGGAGGSSAGGGGAGGAGGAGGWLLGDGGAGGIGGAST
..........
:::::::::::::::::
.............................
:::::::::::
...GSGAPGGAGGAAGLWGT-----------------------------GGAGGIGGAST
BCG ....
H37Rv .. VLGGTGGGGGVGGLWGAGGAGGAGGTGLVGGDGGAGGAGGTGGLLAGLIGAGGGHGGTGG
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
VLGGTGGGGGVGGLWGAGGAGGAGGTGLVGGDGGAGGAGGTGGLLAGLIGAGGGHGGTGG
BCG ....
.......
H37Rv .. LSTNGDGGVGGAGGNAGMLAGPGGAGGAGGDGENLDTGGDGGAGGSAGLLFGSGGAGGAG
.......
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
LSTNGDGGVGGAGGNAGMLAGPGGAGGAGGDGENLDTGGDGGAGGSAGLLFGSGGAGGAG
BCG ....
H37Rv .. GFGFLGGDGGAGGNAGLLLSSGGAGGFGGFGTAGGVGGAGGNAGWLGF-----------::::::::::::::::::::::::::::::::::::::::::::::::
GFGFLGGDGGAGGNAGLLLSSGGAGGFGGFGTAGGVGGAGGNAGWLGFGGAGGIGGIGGN
BCG ....
.......
H37Rv .. ----------------------------------GGAGGVGGSAGLIGTGGNGGNGGTGA
.......::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::
BCG ....
ANGGAGGNGGTGGQLWGSGGAGGEGGAALSVGDTGGAGGVGGSAGLIGTGGNGGNGGTGA
H37Rv .. NAGSPGTGGAGGLLLGQNGLNGLP*
::::::::::::::::::::::::
NAGSPGTGGAGGLLLGQNGLNGLP*
BCG ....
.......
Figure 5 The PE and PPE protein families. a, Classification of the PE and PPE
protein families. b, Sequence variation between M. tuberculosis H37Rv and M.
bovis BCG-Pasteur in the PE-PGRS encoded by open reading frame (ORF)
Rv0746.
NATURE | VOL 393 | 11 JUNE 1998
541
article
generated by a family of S-adenosyl-L-methionine-dependent
enzymes, which use the unsaturated meromycolic acid as a substrate
to generate cis and trans cyclopropanes and other mycolates. Six
members of this family have been identified and characterized25 and
two clustered, convergently transcribed new genes are evident in the
genome (umaA1 and umaA2). From the functions of the known
family members and the structures of mycolic acids in M. tuberculosis, it is tempting to speculate that these new enzymes may
introduce the trans cyclopropanes into the meromycolate precursor.
In addition to these two methyltransferases, there are two other
unrelated lipid methyltransferases (Ufa1 and Ufa2) that share
homology with cyclopropane fatty acid synthase of E. coli25.
Although cyclopropanation seems to be a relatively common
modification of mycolic acids, cyclopropanation of plasma-membrane constituents has not been described in mycobacteria. Tuberculostearic acid is produced by methylation of oleic acid, and may
be synthesized by one of these two enzymes.
Condensation of the fully functionalized and preformed meromycolate chain with a 26-carbon a-branch generates full-length
mycolic acids that must be transported to their final location for
attachment to the cell-wall arabinogalactan. The transfer and
subsequent transesterification is mediated by three well-known
immunogenic proteins of the antigen 85 complex26. The genome
encodes a fourth member of this complex, antigen 85C9 ( fbpC2,
Rv0129), which is highly related to antigen 85C. Further studies are
needed to show whether the protein possesses mycolytransferase
activity and to clarify the reason behind the apparent redundancy.
Polyketide synthesis. Mycobacteria synthesize polyketides by several different mechanisms. A modular type I system, similar to that
involved in erythromycin biosynthesis23, is encoded by a very large
operon, ppsABCDE, and functions in the production of
phenolphthiocerol5. The absence of a second type I polyketide
synthase suggests that the related lipids phthiocerol A and B,
phthiodiolone A and phthiotriol may all be synthesized by the
same system, either from alternative primers or by differential
postsynthetic modification. It is physiologically significant that
the pps gene cluster occurs immediately upstream of mas, which
encodes the multifunctional enzyme mycocerosic acid synthase
(MAS), as their products phthiocerol and mycocerosic acid esterify
to form the very abundant cell-wall-associated molecule phthiocerol dimycocerosate (Fig. 4c).
Members of another large group of polyketide synthase enzymes
are similar to MAS, which also generates the multiply methylbranched fatty acid components of mycosides and phthiocerol
dimycocerosate, abundant cell-wall-associated molecules5. Although
some of these polyketide synthases may extend type I FAS CoA
primers to produce other long-chain methyl-branched fatty acids
such as mycolipenic, mycolipodienic and mycolipanolic acids or the
phthioceranic and hydroxyphthioceranic acids, or may even show
functional overlap5, there are many more of these enzymes than
there are known metabolites. Thus there may be new lipid and
polyketide metabolites that are expressed only under certain conditions, such as during infection and disease.
A fourth class of polyketide synthases is related to the plant
enzyme superfamily that includes chalcone and stilbene synthase23.
These polyketide synthases are phylogenetically divergent from all
other polyketide and fatty acid synthases and generate unreduced
polyketides that are typically associated with anthocyanin pigments
and flavonoids. The function of these systems, which are often
linked to apparent type I modules, is unknown. An example is the
gene cluster spanning pks10, pks7, pks8 and pks9, which includes two
of the chalcone-synthase-like enzymes and two modules of an
apparent type I system. The unknown metabolites produced by
these enzymes are interesting because of the potent biological
activities of some polyketides such as the immunosuppressor
rapamycin.
Siderophores. Peptides that are not ribosomally synthesized are
542
article
proteins that are unrelated except for the presence of the common
180-residue PPE domain.
The subcellular location of the PE and PPE proteins is unknown
and in only one case, that of a lipase (Rv3097), has a function been
demonstrated. On examination of the protein database from the
extensively sequenced M. leprae15, no PGRS- or MPTR-related
polypeptides were detected but a few proteins belonging to the
non-MPTR subgroup of the PPE family were found. These proteins
include one of the major antigens recognized by leprosy patients,
the serine-rich antigen34. Although it is too early to attribute
biological functions to the PE and PPE families, it is tempting to
speculate that they could be of immunological importance. Two
interesting possibilities spring to mind. First, they could represent
the principal source of antigenic variation in what is otherwise a
genetically and antigenically homogeneous bacterium. Second,
these glycine-rich proteins might interfere with immune responses
by inhibiting antigen processing.
Several observations and results support the possibility of antigenic variation associated with both the PE and the PPE family
proteins. The PGRS member Rv1759 is a fibronectin-binding
protein of relative molecular mass 55,000 (ref. 35) that elicits a
variable antibody response, indicating either that individuals
mount different immune responses or that this PGRS protein
may vary between strains of M. tuberculosis. The latter possibility
is supported by restriction fragment length polymorphisms for
various PGRS and MPTR sequences in clinical isolates33. Direct
support for genetic variation within both the PE and the PPE
families was obtained by comparative DNA sequence analysis (Fig.
5). The gene for the PEPGRS protein Rv0746 of BCG differs from
that in H37Rv by the deletion of 29 codons and the insertion of 46
codons. Similar variation was seen in the gene for the PPE protein
Rv0442 (data not shown). As these differences were all associated
with repetitive sequences they could have resulted from intergenic
or intragenic recombinational events or, more probably, from
strand slippage during replication32. These mechanisms are
known to generate antigenic variability in other bacterial
pathogens36.
There are several parallels between the PGRS proteins and the
EpsteinBarr virus nuclear antigens (EBNAs). Members of both
polypeptide families are glycine-rich, contain extensive GlyAla
repeats, and exhibit variation in the length of the repeat region
between different isolates. The GlyAla repeat region of EBNA1
functions as a cis-acting inhibitor of the ubiquitin/proteasome
antigen-processing pathway that generates peptides presented in
the context of major histocompatibility complex (MHC) class I
molecules37,38. MHC class I knockout mice are very susceptible to M.
tuberculosis, underlining the importance of a cytotoxic T-cell
response in protection against disease3,39. Given the many potential
effects of the PPE and PE proteins, it is important that further
studies are performed to understand their activity. If extensive
antigenic variability or reduced antigen presentation were indeed
found, this would be significant for vaccine design and for understanding protective immunity in tuberculosis, and might even
explain the varied responses seen in different BCG vaccination
programmes40.
Pathogenicity. Despite intensive research efforts, there is little
information about the molecular basis of mycobacterial virulence41.
However, this situation should now change as the genome sequence
will accelerate the study of pathogenesis as never before, because
other bacterial factors that may contribute to virulence are becoming apparent. Before the completion of the genome sequence, only
three virulence factors had been described41: catalase-peroxidase,
which protects against reactive oxygen species produced by the
phagocyte; mce, which encodes macrophage-colonizing factor42;
and a sigma factor gene, sigA (aka rpoV), mutations in which can
lead to attenuation41. In addition to these single-gene virulence
factors, the mycobacterial cell wall4 is also important in pathology,
NATURE | VOL 393 | 11 JUNE 1998
.........................................................................................................................
Methods
543
article
and molecular genetics of cell-wall lipid biosynthesis in mycobacteria. Mol. Microbiol. 24, 263270
(1997).
6. Sreevatsan, S. et al. Restricted structural gene polymorphism in the Mycobacterium tuberculosis
complex indicates evolutionarily recent global dissemination. Proc. Natl Acad. Sci. USA 94, 9869
9874 (1997).
7. Brosch, R. et al. Use of a Mycobacterium tuberculosis H37Rv bacterial artificial chromosome library for
genome mapping, sequencing and comparative genomics. Infect. Immun. 66, 22212229 (1998).
8. Philipp, W. J. et al. An integrated map of the genome of the tubercle bacillus, Mycobacterium
tuberculosis H37Rv, and comparison with Mycobacterium leprae. Proc. Natl Acad. Sci. USA 93, 3132
3137 (1996).
9. Blattner, F. R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 14531462
(1997).
10. Cole, S. T. & Saint-Girons, I. Bacterial genomics. FEMS Microbiol. Rev. 14, 139160 (1994).
11. Freiberg, C. et al. Molecular basis of symbiosis between Rhizobium and legumes. Nature 387, 394401
(1997).
12. Bardarov, S. et al. Conditionally replicating mycobacteriophages: a system for transposon delivery to
Mycobacterium tuberculosis. Proc. Natl Acad. Sci. USA 94, 1096110966 (1997).
13. Mahairas, G. G., Sabo, P. J., Hickey, M. J., Singh, D. C. & Stover, C. K. Molecular analysis of genetic
differences between Mycobacterium bovis BCG and virulent M. bovis. J. Bacteriol. 178, 12741282
(1996).
14. Kunst, F. et al. The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature
390, 249256 (1997).
15. Smith, D. R. et al. Multiplex sequencing of 1.5 Mb of the Mycobacterium leprae genome. Genome Res.
7, 802819 (1997).
16. Greenacre, M. Theory and Application of Correspondence Analysis (Academic, London, 1984).
17. Ratledge, C. R. in The Biology of the Mycobacteria (eds Ratledge, C. & Stanford, J.) 5394 (Academic,
San Diego, 1982).
18. Av-Gay, Y. & Davies, J. Components of eukaryotic-like protein signaling pathways in Mycobacterium
tuberculosis. Microb. Comp. Genomics 2, 6373 (1997).
19. Cole, S. T. & Telenti, A. Drug resistance in Mycobacterium tuberculosis. Eur. Resp. Rev. 8, 701S713S
(1995).
20. Riley, M. & Labedan, B. in Escherichia coli and Salmonella (ed. Neidhardt, F. C.) 21182202 (ASM,
Washington, 1996).
21. Mdluli, K. et al. Inhibition of a Mycobacterium tuberculosis b-ketoacyl ACP synthase by isoniazid.
Science 280, 16071610 (1998).
22. Banerjee, A. et al. inhA, a gene encoding a target for isoniazid and ethionamide in Mycobacterium
tuberculosis. Science 263, 227230 (1994).
23. Hopwood, D. A. Genetic contributions to understanding polyketide synthases. Chem. Rev. 97, 2465
2497 (1997).
24. Minnikin, D. E. in The Biology of the Mycobacteria (eds Ratledge, C. & Stanford, J.) 95184 (Academic,
London, 1982).
25. Barry, C. E. III et al. Mycolic acids: structure, biosynthesis, and phsyiological functions. Prog. Lipid
Res. (in the press).
26. Belisle, J. T. et al. Role of the major antigen of Mycobacterium tuberculosis in cell wall biogenesis.
Science 276, 14201422 (1997).
27. Marahiel, M. A., Stachelhaus, T. & Mootz, H. D. Modular peptide synthetases involved in
nonribosomal peptide synthesis. Chem. Rev. 97, 26512673 (1997).
28. Gobin, J. et al. Iron acquisition by Mycobacterium tuberculosis: isolation and characterization of a
family of iron-binding exochelins. Proc. Natl Acad. Sci. USA 92, 51895193 (1995).
29. Young, D. B. & Fruth, U. in New Generation Vaccines (eds Levine, M., Woodrow, G., Kaper, J. & Cobon,
G. S.) 631645 (Marcel Dekker, New York, 1997).
30. Sorensen, A. L., Nagai, S., Houen, G., Andersen, P. & Anderson, A. B. Purification and characterization
of a low-molecular-mass T-cell antigen secreted by Mycobacterium tuberculosis. Infect. Immun. 63,
544
17101717 (1995).
31. Hermans, P. W. M., van Soolingen, D. & van Embden, J. D. A. Characterization of a major
polymorphic tandem repeat in Mycobacterium tuberculosis and its potential use in the epidemiology
of Mycobacterium kansasii and Mycobacterium gordonae. J. Bacteriol. 174, 41574165 (1992).
32. Poulet, S. & Cole, S. T. Characterisation of the polymorphic GC-rich repetitive sequence (PGRS)
present in Mycobacterium tuberculosis. Arch. Microbiol. 163, 8795 (1995).
33. Cole, S. T. & Barrell, B. G. in Genetics and Tuberculosis (eds Chadwick, D. J. & Cardew, G., Novartis
Foundation Symp. 217) 160172 (Wiley, Chichester, 1998).
34. Vega-Lopez, F. et al. Sequence and immunological characterization of a serine-rich antigen from
Mycobacterium leprae. Infect. Immun. 61, 21452153 (1993).
35. Abou-Zeid, C. et al. Genetic and immunological analysis of Mycobacterium tuberculosis fibronectinbinding proteins. Infect. Immun. 59, 27122718 (1991).
36. Robertson, B. D. & Meyer, T. F. Genetic variation in pathogenic bacteria. Trends Genet. 8, 422427
(1992).
37. Levitskaya, J. et al. Inhibition of antigen processing by the internal repeat region of the Epstein-Barr
virus nuclear antigen-1. Nature 375, 685688 (1995).
38. Levitskaya, J., Sharipo, A., Leonchiks, A., Ciechanover, A. & Masucci, M. G. Inhibition of ubiquitin/
proteasome-dependent protein degradation by the Gly-Ala repeat domain of the Epstein-Barr virus
nuclear antigen 1. Proc. Natl Acad. Sci. USA 94, 1261612621 (1997).
39. Flynn, J. L., Goldstein, M. A., Treibold, K. J., Koller, B. & Bloom, B. R. Major histocompatability
complex class-I restricted T cells are required for resistance to Mycobacterium tuberculosis infection.
Proc. Natl Acad. Sci. USA 89, 1201312017 (1992).
40. Bloom, B. R. & Fine, P. E. M. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.)
531557 (Am. Soc. Microbiol., Washington DC, 1994).
41. Collins, D. M. In search of tuberculosis virulence genes. Trends Microbiol. 4, 426430 (1996).
42. Arruda, S., Bomfim, G., Knights, R., Huima-Byron, T. & Riley, L. W. Cloning of an M. tuberculosis
DNA fragment associated with entry and survival inside cells. Science 261, 14541457 (1993).
43. Baumler, A. J., Kusters, J. G., Stojikovic, I. & Heffron, F. Salmonella typhimurium loci involved in
survival within macrophages. Infect. Immun. 62, 16231630 (1994).
44. McMurray, A. A., Sulston, J. E. & Quail, M. A. Short-insert libraries as a method of problem solving in
genome sequencing. Genome Res. 8, 562566 (1998).
45. Bonfield, J. K., Smith, K. F. & Staden, R. A new DNA sequence assembly program. Nucleic Acids Res.
24, 49924999 (1995).
46. Krogh, A., Mian, I. S. & Haussler, D. A hidden Markov model that finds genes in E. coli DNA. Nucleic
Acids Res. 22, 47684778 (1994).
47. Bairoch, A., Bucher, P. & Hofmann, K. The PROSITE database, its status in 1997. Nucleic Acids Res. 25,
217221 (1997).
48. Altschul, S., Gish, W., Miller, W., Myers, E. & Lipman, D. A basic local alignment search tool. J. Mol.
Biol. 215, 403410 (1990).
49. Pearson, W. & Lipman, D. Improved tools for biological sequence comparisons. Proc. Natl Acad. USA
85, 24442448 (1988).
50. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in
genomic DNA. Nucleic Acids Res. 25, 955964 (1997).
Acknowledgements. We thank Y. Av-Gay, F.-C. Bange, A. Danchin, B. Dujon, W. R. Jacobs Jr, L. Jones,
M. McNeil, I. Moszer, P. Rice and J. Stephenson for advice, reagents and support. This work was supported
by the Wellcome Trust. Additional funding was provided by the Association Francaise Raoul Follereau,
the World Health Organisation and the Institut Pasteur. S.V.G. received a Wellcome Trust travelling
research fellowship.
Correspondence and requests for materials should be addressed to B.G.B. (barrell@sanger.ac.uk) or S.T.C.
(stcole@pasteur.fr). The complete sequence has been deposited in EMBL/GenBank/DDBJ as MTBH37RV,
accession number AL123456.
accA2
Rv2502c
accD1
Rv0974c
accD2
Rv3667
Rv3409c
Rv0222
acs
choD
echA1
Rv0456c
echA2
Rv0632c
echA3
Rv0673
echA4
Rv0675
echA5
Rv0905
echA6
Rv0971c
echA7
Rv1070c
echA8
Rv1071c
echA9
Rv1142c
echA10
Rv1141c
echA11
Rv1472
echA12
Rv1935c
echA13
Rv2486
echA14
Rv2679
echA15
acetyl/propionyl-CoA carboxylase,
subunit
acetyl/propionyl-CoA carboxylase,
subunit
acetyl/propionyl-CoA carboxylase,
subunit
acetyl/propionyl-CoA carboxylase,
subunit
acetyl-CoA synthase
cholesterol oxidase
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily (aka eccH)
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
superfamily
enoyl-CoA hydratase/isomerase
Rv2831
Rv3039c
Rv3373
Rv3374
Rv3516
Rv3550
Rv3774
Rv0859
Rv0243
Rv1074c
Rv1323
Rv3546
Rv3556c
Rv0860
Rv0468
Rv1715
Rv3141
Rv1912c
Rv1750c
Rv0270
Rv3561
Rv0214
Rv0166
Rv1206
Rv0119
Rv0551c
Rv2590
Rv0099
Rv1550
Rv1549
Rv1427c
Rv3089
Rv1058
Rv2187
Rv0852
Rv3506
Rv3513c
Rv3515c
Rv1185c
Rv2948c
Rv3826
Rv1529
Rv1521
Rv2930
Rv0275c
Rv2941
Rv2950c
Rv0404
Rv1925
Rv3801c
Rv1345
Rv0035
Rv2505c
Rv1193
Rv0131c
Rv0154c
Rv0215c
Rv0231
Rv0244c
Rv0271c
Rv0400c
Rv0672
Rv0752c
Rv0873
Rv0972c
Rv0975c
Rv1346
Rv1467c
Rv1679
Rv1934c
Rv1933c
Rv2500c
Rv2724c
Rv2789c
Rv3061c
Rv3140
Rv3139
Rv3274c
Rv3504
Rv3505
Rv3544c
superfamily
enoyl-CoA hydratase/isomerase
superfamily
echA17 enoyl-CoA hydratase/isomerase
superfamily
echA18 enoyl-CoA hydratase/isomerase
superfamily, N-term
echA18' enoyl-CoA hydratase/isomerase
superfamily, C-term
echA19 enoyl-CoA hydratase/isomerase
superfamily
echA20 enoyl-CoA hydratase/isomerase
superfamily
echA21 enoyl-CoA hydratase/isomerase
superfamily
fadA
oxidation complex, subunit
(acetyl-CoA C-acetyltransferase)
fadA2
acetyl-CoA C-acetyltransferase
fadA3
acetyl-CoA C-acetyltransferase
fadA4
acetyl-CoA C-acetyltransferase
(aka thiL)
fadA5
acetyl-CoA C-acetyltransferase
fadA6
acetyl-CoA C-acetyltransferase
fadB
oxidation complex, subunit
(multiple activities)
fadB2
3-hydroxyacyl-CoA dehydrogenase
fadB3
3-hydroxyacyl-CoA dehydrogenase
fadB4
3-hydroxyacyl-CoA dehydrogenase
fadB5
3-hydroxyacyl-CoA dehydrogenase
fadD1
acyl-CoA synthase
fadD2
acyl-CoA synthase
fadD3
acyl-CoA synthase
fadD4
acyl-CoA synthase
fadD5
acyl-CoA synthase
fadD6
acyl-CoA synthase
fadD7
acyl-CoA synthase
fadD8
acyl-CoA synthase
fadD9
acyl-CoA synthase
fadD10 acyl-CoA synthase
fadD11 acyl-CoA synthase, N-term
fadD11' acyl-CoA synthase, C-term
fadD12 acyl-CoA synthase
fadD13 acyl-CoA synthase
fadD14 acyl-CoA synthase
fadD15 acyl-CoA synthase
fadD16 acyl-CoA synthase
fadD17 acyl-CoA synthase
fadD18 acyl-CoA synthase
fadD19 acyl-CoA synthase
fadD21 acyl-CoA synthase
fadD22 acyl-CoA synthase
fadD23 acyl-CoA synthase
fadD24 acyl-CoA synthase
fadD25 acyl-CoA synthase
fadD26 acyl-CoA synthase
fadD27 acyl-CoA synthase
fadD28 acyl-CoA synthase
fadD29 acyl-CoA synthase
fadD30 acyl-CoA synthase
fadD31 acyl-CoA synthase
fadD32 acyl-CoA synthase
fadD33 acyl-CoA synthase
fadD34 acyl-CoA synthase
fadD35 acyl-CoA synthase
fadD36 acyl-CoA synthase
fadE1
acyl-CoA dehydrogenase
fadE2
acyl-CoA dehydrogenase
fadE3
acyl-CoA dehydrogenase
fadE4
acyl-CoA dehydrogenase
fadE5
acyl-CoA dehydrogenase
fadE6
acyl-CoA dehydrogenase
fadE7
acyl-CoA dehydrogenase
fadE8
acyl-CoA dehydrogenase
(aka aidB)
fadE9
acyl-CoA dehydrogenase
fadE10 acyl-CoA dehydrogenase
fadE12 acyl-CoA dehydrogenase
fadE13 acyl-CoA dehydrogenase
fadE14 acyl-CoA dehydrogenase
fadE15 acyl-CoA dehydrogenase
fadE16 acyl-CoA dehydrogenase
fadE17 acyl-CoA dehydrogenase
fadE18 acyl-CoA dehydrogenase
fadE19 acyl-CoA dehydrogenase
(aka mmgC)
fadE20 acyl-CoA dehydrogenase
fadE21 acyl-CoA dehydrogenase
fadE22 acyl-CoA dehydrogenase
fadE23 acyl-CoA dehydrogenase
fadE24 acyl-CoA dehydrogenase
fadE25 acyl-CoA dehydrogenase
fadE26 acyl-CoA dehydrogenase
fadE27 acyl-CoA dehydrogenase
fadE28 acyl-CoA dehydrogenase
echA16
Rv3543c
Rv3560c
Rv3562
Rv3563
Rv3564
Rv3573c
Rv3797
Rv3761c
Rv1175c
Rv0855
Rv1143
Rv1492
fadE29
fadE30
fadE31
fadE32
fadE33
fadE34
fadE35
fadE36
fadH
far
mcr
mutA
Rv1493
mutB
Rv2504c
scoA
Rv2503c
scoB
Rv1136
Rv1683
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
acyl-CoA dehydrogenase
2,4-Dienoyl-CoA Reductase
fatty acyl-CoA racemase
-methyl acyl-CoA racemase
methylmalonyl-CoA mutase,
subunit
methylmalonyl-CoA mutase,
subunit
3-oxo acid:CoA transferase, subunit
3-oxo acid:CoA transferase, subunit
probable carnitine racemase
possible acyl-CoA synthase
4. Phosphorous compounds
Rv2368c phoH
ATP-binding pho regulon
component
Rv1095
phoH2 PhoH-like protein
Rv3628
ppa
probable inorganic pyrophosphatase
Rv2984
ppk
polyphosphate kinase
B. Energy metabolism
1. Glycolysis
Rv1023
eno
enolase
Rv0363c fba
fructose bisphosphate aldolase
Rv1436
gap
glyceraldehyde 3-phosphate dehydrogenase
Rv0489
gpm
phosphoglycerate mutase I
Rv3010c pfkA
phosphofructokinase I
Rv2029c pfkB
phosphofructokinase II
Rv0946c pgi
glucose-6-phosphate isomerase
Rv1437
pgk
phosphoglycerate kinase
Rv1617
pykA
pyruvate kinase
Rv1438
tpi
triosephosphate isomerase
Rv2419c putative phosphoglycerate mutase
Rv3837c putative phosphoglycerate mutase
2. Pyruvate dehydrogenase
Rv2241
aceE
pyruvate dehydrogenase E1 component
Rv3303c lpdA
dihydrolipoamide dehydrogenase
Rv2497c pdhA
pyruvate dehydrogenase E1 component subunit
Rv2496c pdhB
pyruvate dehydrogenase E1 component subunit
Rv2495c pdhC
dihydrolipoamide acetyltransferase
Rv0462
probable dihydrolipoamide dehydrogenase
3. TCA cycle
Rv1475c acn
Rv0889c citA
Rv2498c citE
Rv1098c fum
Rv1131
gltA1
Rv0896
gltA2
Rv3339c icd1
Rv0066c icd2
Rv0794c lpdB
Rv1240
mdh
Rv2967c pca
Rv3318
sdhA
Rv3319
sdhB
Rv3316
sdhC
Rv3317
sdhD
Rv1248c
Rv2215
sucA
sucB
Rv0951
Rv0952
sucC
sucD
4. Glyoxylate bypass
Rv0467
aceA
Rv1915
aceAa
Rv1916
aceAb
Rv1837c glcB
Rv3323c gphA
aconitate hydratase
citrate synthase 2
citrate lyase chain
fumarase
citrate synthase 3
citrate synthase 1
isocitrate dehydrogenase
isocitrate dehydrogenase
dihydrolipoamide dehydrogenase
malate dehydrogenase
pyruvate carboxylase
succinate dehydrogenase A
succinate dehydrogenase B
succinate dehydrogenase C subunit
succinate dehydrogenase D subunit
2-oxoglutarate dehydrogenase
dihydrolipoamide succinyltransferase
succinyl-CoA synthase chain
succinyl-CoA synthase chain
isocitrate lyase
isocitrate lyase, module
isocitrate lyase, module
malate synthase
phosphoglycolate phosphatase
Rv2436
Rv1408
Rv2465c
Rv1448c
Rv1449c
Rv1121
rbsK
rpe
rpi
tal
tkt
zwf
Rv1447c
zwf2
6. Respiration
a. aerobic
Rv0527
ccsA
Rv0529
ccsB
Rv1451
ctaB
Rv2200c
Rv3043c
ctaC
ctaD
Rv2193
ctaE
Rv1542c
Rv2470
Rv2249c
glbN
glbO
glpD1
Rv3302c
glpD2
Rv0694
lldD1
Rv1872c
Rv1854c
Rv3145
Rv3146
Rv3147
Rv3148
Rv3149
Rv3150
Rv3151
Rv3152
Rv3153
Rv3154
Rv3155
Rv3156
Rv3157
Rv3158
Rv2195
lldD2
ndh
nuoA
nuoB
nuoC
nuoD
nuoE
nuoF
nuoG
nuoH
nuoI
nuoJ
nuoK
nuoL
nuoM
nuoN
qcrA
Rv2196
qcrB
Rv2194
qcrC
b. anaerobic
Rv2392
cysH
Rv2899c
Rv2900c
fdhD
fdhF
Rv1552
frdA
Rv1553
frdB
Rv1554
frdC
Rv1555
frdD
Rv1161
Rv1162
Rv1164
Rv1163
Rv1736c
Rv2391
narG
narH
narI
narJ
narX
nirA
Rv0252
Rv0253
nirB
nirD
ribokinase
ribulose-phosphate 3-epimerase
phosphopentose isomerase
transaldolase
transketolase
glucose-6-phosphate 1-dehydrogenase
glucose-6-phosphate 1-dehydrogenase
3'-phosphoadenylylsulfate (PAPS)
reductase
affects formate dehydrogenase-N
molybdopterin-containing oxidoreductase
fumarate reductase flavoprotein
subunit
fumarate reductase iron sulphur
protein
fumarate reductase 15kD anchor
protein
fumarate reductase 13kD anchor
protein
nitrate reductase subunit
nitrate reductase chain
nitrate reductase chain
nitrate reductase chain
fused nitrate reductase
probable nitrite reductase/sulphite
reductase
nitrite reductase flavoprotein
probable nitrite reductase small
subunit
c. Electron transport
Rv0409
ackA
acetate kinase
Rv1623c appC
cytochrome bd-II oxidase
subunit I
Rv1622c cydB
cytochrome d ubiquinol oxidase
subunit II
Rv1620c cydC
ABC transporter
Rv1621c cydD
ABC transporter
Rv2007c fdxA
ferredoxin
Rv3554
fdxB
ferredoxin
Rv1177
fdxC
ferredoxin 4Fe-4S
Rv3503c fdxD
probable ferredoxin
Rv3029c fixA
electron transfer flavoprotein
subunit
Rv3028c fixB
electron transfer flavoprotein
subunit
Rv3106
fprA
adrenodoxin and NADPH ferredoxin reductase
Rv0886
fprB
ferredoxin, ferredoxin-NADP
reductase
Rv3251c rubA
rubredoxin A
Rv3250c
rubB
rubredoxin B
chain
chain
e chain
chain
c chain
b chain
chain
chain
5. Sulphur
Rv0711
Rv3299c
Rv0663
Rv3077
Rv0296c
Rv3796
Rv1285
Rv1286
Rv2131c
Rv3248c
Rv3283
Rv2291
Rv3118
Rv0814c
Rv3762c
glucosamine-fructose-6phosphate aminotransferase
metabolism
atsA
arylsulfatase
atsB
proable arylsulfatase
atsD
proable arylsulfatase
atsF
proable arylsulfatase
atsG
proable arylsulfatase
atsH
proable arylsulfatase
cysD
ATP:sulphurylase subunit 2
cysN
ATP:sulphurylase subunit 1
cysQ
homologue of M.leprae cysQ
sahH
adenosylhomocysteinase
sseA
thiosulfate sulfurtransferase
sseB
thiosulfate sulfurtransferase
sseC
thiosulfate sulfurtransferase
sseC2
thiosulfate sulfurtransferase
probable alkyl sulfatase
Rv1878
Rv2860c
Rv2918c
Rv2221c
glnA3
glnA4
glnD
glnE
Rv3859c
gltB
Rv3858c
gltD
Rv3704c
gshA
Rv2427c
Rv2439c
Rv0500
proA
proB
proC
2. Aspartate family
Rv3708c asd
Rv3709c
Rv2201
Rv3565
Rv0337c
Rv2753c
Rv2773c
Rv1202
ask
asnB
aspB
aspC
dapA
dapB
dapE
Rv2141c
dapE2
Rv2726c
Rv1293
Rv3341
Rv1079
Rv3340
Rv1133c
dapF
lysA
metA
metB
metC
metE
Rv2124c
metH
Rv1392
Rv0391
metK
metZ
Rv1294
Rv1296
Rv1295
thrA
thrB
thrC
3. Serine family
Rv0815c cysA2
Rv3117
cysA3
Rv2335
cysE
Rv0511
cysG
Rv2847c
cysG2
Rv2334
Rv1336
Rv1077
Rv0848
Rv1093
Rv0070c
Rv2996c
cysK
cysM
cysM2
cysM3
glyA
glyA2
serA
Rv0505c
serB
Rv3042c
serB2
Rv0884c
serC
thiosulfate sulfurtransferase
thiosulfate sulfurtransferase
serine acetyltransferase
uroporphyrin-III c-methyltransferase
multifunctional enzyme, siroheme
synthase
cysteine synthase A
cysteine synthase B
cystathionine -synthase
putative cysteine synthase
serine hydroxymethyltransferase
serine hydroxymethyltransferase
D-3-phosphoglycerate dehydrogenase
probable phosphoserine phosphatase
C-term similar to phosphoserine
phosphatase
phosphoserine aminotransferase
Rv1601
hisB
Rv1600
hisC
Rv3772
hisC2
Rv1599
hisD
phosphoribosylformimino-5aminoimidazole carboxamide
ribonucleotide isomerase
imidazole glycerol-phosphate
dehydratase
histidinol-phosphate aminotransferase
histidinol-phosphate aminotransferase
histidinol dehydrogenase
Rv1605
hisF
Rv2121c
Rv1602
Rv2122c
hisG
hisH
hisI
Rv1606
hisI2
Rv0114
6. Pyruvate family
Rv3423c alr
imidazole glycerol-phosphate
synthase
ATP phosphoribosyltransferase
amidotransferase
phosphoribosyl-AMP cyclohydrolase
probable phosphoribosyl-AMP 1,6
cyclohydrolase
similar to HisB
Rv3048c
nrdG
Rv3053c
nrdH
Rv3052c
Rv3247c
Rv2764c
Rv0570
Rv3752c
nrdI
tmk
thyA
nrdZ
-
subunit
ribonucleoside-diphosphate small
subunit
glutaredoxin electron transport
component of NrdEF system
NrdI/YgaO/YmaA family
thymidylate kinase
thymidylate synthase
ribonucleotide reductase, class II
probable cytidine/deoxycytidylate
deaminase
alanine racemase
E. Polyamine synthesis
Rv2601
speE
spermidine synthase
F. Purines, pyrimidines, nucleosides and nucleotides
1. Purine ribonucleotide biosynthesis
Rv1389
gmk
putative guanylate kinase
Rv3396c guaA
GMP synthase
Rv1843c guaB1
inosine-5'-monophosphate dehydrogenase
Rv3411c guaB2
inosine-5'-monophosphate dehydrogenase
Rv3410c guaB3
inosine-5'-monophosphate dehydrogenase
Rv1017c prsA
ribose-phosphate pyrophosphokinase
Rv0357c purA
adenylosuccinate synthase
Rv0777
purB
adenylosuccinate lyase
Rv0780
purC
phosphoribosylaminoimidazolesuccinocarboxamide synthase
Rv0772
purD
phosphoribosylamine-glycine ligase
Rv3275c purE
phosphoribosylaminoimidazole
carboxylase
Rv0808
purF
amidophosphoribosyltransferaseRv0957
purH
phosphoribosylaminoimidazolecarboxamide formyltransferase
Rv3276c purK
phosphoribosylaminoimidazole
carboxylase ATPase subunit
Rv0803
purL
phosphoribosylformylglycinamidine synthase II
Rv0809
purM
5'-phosphoribosyl-5-aminoimidazole synthase
Rv0956
purN
phosphoribosylglycinamide
formyltransferase I
Rv0788
purQ
phosphoribosylformylglycinamidine synthase I
Rv0389
purT
phosphoribosylglycinamide
formyltransferase II
Rv2964
purU
formyltetrahydrofolate deformylase
folE
folK
Rv3608c
Rv1207
Rv3607c
folP
folP2
folX
Rv0013
pabA
Rv1005c
Rv0812
pabB
pabC
3. Lipoate
Rv2218
lipA
Rv2217
lipB
3. 2'-deoxyribonucleotide metabolism
Rv0321
dcd
deoxycytidine triphosphate
deaminase
Rv2697c dut
deoxyuridine triphosphatase
Rv0233
nrdB
ribonucleoside-diphosphate
reductase B2 (eukaryotic-like)
Rv3051c nrdE
ribonucleoside diphosphate
reductase chain
Rv1981c nrdF
ribonucleotide reductase small
4. Molybdopterin
Rv3109
moaA
Rv0869c
moaA2
Rv0438c
moaA3
Rv3110
moaB
Rv0984
moaB2
Rv3111
moaC
Rv0864
moaC2
Rv3324c
moaC3
Rv3112
moaD
Rv0868c
moaD2
dihydrofolate reductase
folylpolyglutamate synthase
methylenetetrahydrofolate dehydrogenase
GTP cyclohydrolase I
7,8-dihydro-6-hydroxymethylpterin
pyrophosphokinase
dihydropteroate synthase
dihydropteroate synthase
may be involved in folate biosynthesis
p-aminobenzoate synthase glutamine amidotransferase
p-aminobenzoate synthase
aminodeoxychorismate lyase
Rv3119
moaE
Rv0866
moaE2
Rv3322c
moaE3
Rv0994
Rv3116
Rv2338c
Rv1681
Rv1355c
Rv3206c
moeA
moeB
moeW
moeX
moeY
moeZ
Rv0865
mog
5. Pantothenate
Rv1092c coaA
Rv2225
panB
panC
panD
Rv3602c
Rv3601c
6. Pyridoxine
Rv2607
pdxH
7. Pyridine
Rv1594
Rv1595
Rv1596
Rv0423c
subunit 1
molybdopterin-converting factor
subunit 2
molybdopterin-converting factor
subunit 2
molybdopterin-converting factor
subunit 2
molybdopterin biosynthesis
molybdopterin biosynthesis
molybdopterin biosynthesis
weak similarity to E. coli MoaA
weak similarity to E. coli MoeB
probably involved in
molybdopterin biosynthesis
molybdopterin biosynthesis
pantothenate kinase
3-methyl-2-oxobutanoate
hydroxymethyltransferase
pantoate--alanine ligase
aspartate 1-decarboxylase
pyridoxamine 5'-phosphate
oxidase
nucleotide
nadA
quinolinate synthase
nadB
L-aspartate oxidase
nadC
nicotinate-nucleotide pyrophosphatase
thiC
thiamine synthesis, pyrimidine
moiety
8. Thiamine
Rv0422c thiD
Rv0414c thiE
Rv0417
thiG
Rv2977c
thiL
9. Riboflavin
Rv1940
ribA
Rv1415
ribA2
Rv1412
ribC
Rv2671
ribD
Rv2786c ribF
Rv1409
ribG
Rv1416
ribH
Rv3300c -
phosphomethylpyrimidine kinase
thiamine synthesis, thiazole
moiety
thiamine synthesis, thiazole
moiety
probable thiamine-monophosphate kinase
GTP cyclohydrolase II
probable GTP cyclohydrolase II
riboflavin synthase chain
probable riboflavin deaminase
riboflavin kinase
riboflavin biosynthesis
riboflavin synthase chain
probable deaminase, riboflavin
synthesis
and porphyrin
hemA
glutamyl-tRNA reductase
hemB
-aminolevulinic acid dehydratase
hemC
porphobilinogen deaminase
hemE
uroporphyrinogen decarboxylase
Rv1300
Rv0524
hemK
hemL
Rv2388c
hemN
Rv2677c
Rv1485
hemY'
hemZ
13. Cobalamin
Rv2849c cobA
Rv2848c cobB
Rv2231c
Rv2236c
Rv2064
Rv2065
Rv2066
Rv2070c
Rv2072c
Rv2071c
Rv2062c
Rv2208
cobC
cobD
cobG
cobH
cobI
cobK
cobL
cobM
cobN
cobS
Rv2207
cobT
Rv0254c
Rv0255c
Rv3713
Rv0306
cobU
cobQ
cobQ2
-
viuB
Rv3525c
protoporphyrinogen oxidase
glutamate-1-semialdehyde aminotransferase
oxygen-independent coproporphyrinogen III oxidase
protoporphyrinogen oxidase
ferrochelatase
cob(I)alamin adenosyltransferase
cobyrinic acid a,c-diamide
synthase
aminotransferase
cobinamide synthase
percorrin reductase
precorrin isomerase
CobI-CobJ fusion protein
precorrin reductase
probable methyltransferase
precorrin-3 methylase
cobalt insertion
cobalamin (5'-phosphate)
synthase
nicotinate-nucleotide-dimethylbenzimidazole transferase
cobinamide kinase
cobyric acid synthase
possible cobyric acid synthase
similar to BluB cobalamin synthesis protein R. capsulatus
bacterioferritin
bacterioferritin
probable isochorismate synthase
weak similarity to many phosphoglycerate mutases
similar to proteins involved in
vibriobactin uptake
similar to ferripyochelin binding
protein
H. Lipid biosynthesis
1. Synthesis of fatty and mycolic acids
Rv3285
accA3
acetyl/propionyl CoA carboxylase
subunit
Rv0904c accD3
acetyl/propionyl CoA carboxylase
subunit
Rv3799c accD4
acetyl/propionyl CoA carboxylase
subunit
Rv3280
accD5
acetyl/propionyl CoA carboxylase
subunit
Rv2247
accD6
acetyl/propionyl CoA carboxylase
subunit
Rv2244
acpM
acyl carrier protein (meromycolate
extension)
Rv2523c acpS
CoA:apo-[ACP] pantethienephosphotransferase
Rv2243
fabD
malonyl CoA-[ACP] transacylase
Rv0649
fabD2
malonyl CoA-[ACP] transacylase
Rv1483
fabG1
3-oxoacyl-[ACP] reductase (aka
MabA)
Rv1350
fabG2
3-oxoacyl-[ACP] Reductase
Rv2002
fabG3
3-oxoacyl-[ACP] reductase
Rv0242c fabG4
3-oxoacyl-[ACP] reductase
Rv2766c fabG5
3-oxoacyl-[ACP] reductase
Rv0533c fabH
-ketoacyl-ACP synthase III
Rv2524c fas
fatty acid synthase
Rv1484
inhA
enoyl-[ACP] reductase
Rv2245
kasA
-ketoacyl-ACP synthase
(meromycolate extension)
Rv2246
kasB
-ketoacyl-ACP synthase
(meromycolate extension)
Rv1618
tesB1
thioesterase II
Rv2605c tesB2
thioesterase II
Rv0033
possible acyl carrier protein
Rv1344
possible acyl carrier protein
Rv1722
possible biotin carboxylase
Rv3221c resembles biotin carboxyl carrier
Rv3472
possible acyl carrier protein
2. Modification of fatty and mycolic acids
Rv3391
acrA1
fatty acyl-CoA reductase
Rv3392c cmaA1 cyclopropane mycolic acid
synthase 1
Rv0503c cmaA2 cyclopropane mycolic acid synthase 2
Rv0824c desA1
acyl-[ACP] desaturase
Rv1094
desA2
acyl-[ACP] desaturase
Rv3229c desA3
acyl-[ACP] desaturase
Rv0645c mmaA1 methoxymycolic acid synthase 1
Rv0644c mmaA2 methoxymycolic acid synthase 2
Rv0643c mmaA3 methoxymycolic acid synthase 3
Rv0642c mmaA4 methoxymycolic acid synthase 4
Rv0447c ufaA1
unknown fatty acid methyltransferase
Rv3538
ufaA2
unknown fatty acid methyltransferase
Rv0469
umaA1 unknown mycolic acid methyltransferase
Rv0470c
umaA2
Rv2931
Rv2932
Rv2933
Rv2934
Rv2935
Rv2928
Rv1544
ppsA
ppsB
ppsC
ppsD
ppsE
tesA
-
2,100,001
1,950,001
1,800,001
1,650,001
1,500,001
1,350,001
1,200,001
1,050,001
900,001
750,001
600,001
450,001
300,001
150,001
aA
sM
43
09
urF
42
09
cy
bU
pra
etB
m
44 45
09 09
12
12
PE
15
12
pB
10 pC
tr
16
B
trp
55 856
1
18
od
27
17
28 729
17
1
h
nd
26
17
A
61 h
18 ad
63 64 65
18 18 18
32 33 34 35
17 17 17 17
B
C
od
od
m
m
D
od
m
1
bD
ga
30
17
bc
rX
na
trp
E
trp
47
13
74
14
aA
ch
73
14
uT
le
t
lg
18 19 20
12 12 12
RS
G
_P
PE
cD
su
14
fa
dE
oY
ph
bP
17
12
12
A B hA
trx trx ec
fa
86
10
50
09
cC
su
T
ho
RS
G
_P
PE
15
dE
fa
ctp
4 3
s1 34
pk 1
21
05
20
05
ga
3
rK
na
35
01
41
01
54
09
22
08
rp
oC
66
18
2
rK
na
ac
48
13
sig
38
17
87 PE
03 P
pk
42
01
nA
49
13
A
htr
67
18
39
17
py
kA
55
09
23
08
78
14
68
18
40 41 42
17 17 17
1
sB
te
77
14
44
01
ro
dA
26
12
19
16
69
18
nE
pk
fa
dE
70 71
D2
18 18 lld
bH
fa
A
en
m
dE
fa
47
17
p
ap
hA
in
73 74 75
18 18 18
dB
cy
56
13
bG
97
10
86
14
A
bfr
34
12
qY
25 uV
le
16
77
18
49
17
48
17
27
90
14
79
18
51
17
52
17
91
14
lA
A2
c
ac
lp
E
PP
sA
rp
54
17
31
16
utB
m
62 63
13 13
RS
G
_P
PE
'
tB
ly
plc
66
13
67
13
RS
G
_P
PE
45
08
89 90
18 18
35
16
99 00
14 15
rF
lp
cA
su
94
18
91 92 93
18 18 18
RS )
G 22
_P ag
PE (w
34
16
98
14
10
58
61
IS
17
56 57
17 17
88
18
rL
na
90 691
0
06
43
08
82
02
pta
hE
ad
95
18
8
s1
pk
50
12
d2
gn
60
17
lp
98
18
pD
61 62
17 17
rA
uv
oB
J
lip
10
61
IS
cin
sX
ly
09
15
77
13
10
15
78
13
htp
03 04
19 19
aa
yrR
55
12
dA
gp
91
02
51
00
dA
gm
yrB
56
12
29
11
06
19
07
19
69
17
yrC
57
12
45
16
tG
ka
70
17
58
12
A1
glt
grc
C2
65
05
73
17
72
17
47
16
15
15
24
04
61 62
12 12
etE
m
62
08
PE
2
iB
am
yrF
14
19
13
19
76
17
75
17
eS
ph
Aa
e
ac
77
17
eT
ph
sig
83
01
ctp
nrd
71
05
Ab
e
ac
78
17
fa
2
dD
79
17
nH
pk
E
PP
80
17
81
17
arg
12
pL
m
m
C
bR
em
37 38 39
11 11 11
F k 0
9
IH
dfp
m gm 13
65
12
RS
G
_P
PE
E
PP
E
PP
36
11
64
00
26
07
75
05
02
10
RS
G
_P
PE
93
13
arg
E
PP
82
17
arg
24
15
44
11
04
10
26
15
95
13
83
17
19
19
ic
74
08
20
19
pF
lp
arg
73
12
45
11
31
07
47
11
76
08
85
17
s5
22
19
86
17
0
s1
pk
pk
97 98 pH
13 13 li
D
lip
94
01
80
05
qN
lp
81 82
05 05
39
04
I
lip
24
19
E
PP
01
14
77
12
fa
31
dD
pA
26
19
E
PP
s7
pa
pk
riA
78
12
51
11
49 50
11 11
09
10
79
08
E
IK
-L
IS
08
10
E
PP
P
RE
48
11
77
08
37
07
EL
gro
38
07
84
05
41
04
11
03
fa
2
dD
05
14
27
19
28 29 30 31
19 19 19 19
95
17
tp
18
dE
fa
17
dE
faa
88
08
36
19
98
17
S
ile
rG
lp
7
s1
pk
10
14
sN
cy
37
19
pT
s9
pk
90
08
rL
lp
52
04
E
PP
88
12
rH
na
89
12
90
12
rJ rI
na na
66
16
E
PP
A
hB 39
ep 19 rib
1
s1
pk
sA
an
94
08
RS
G
_P
PE
22
14
P
RE
45
19
pG
lp
04 05
18 18
72
16
44 45 46
15 15 15
21
14
arg
qW
lp
1
aE
dn
24
14
64
04
E
PP
dD
fa
rB
th
PE
12
E
PP
54
19
fa
39
03
66
04
58 59 60
19 19 19
62
19
61
19
12
18
16
80
dE
16
fa
74
11
63
19
13
18
PE
14
18
82
16
B1
pls
X
oe
29
14
dH
2
aA
um
1
aA
um
60
15
IS
67
07
07
09
16
06
A
ald
83
16
64 965
19
1
15 16
18 18
A
frd
31
14
rfe
43
03
C
lip
79
11
PE
E
ctp
69
07
3
ce
m
17
18
67
19
RS
G
_P
PE
68
19
85 86 87
16 16 16
84
16
qJ
lp
69
19
19
18
88
16
ty
G
atp
tA
gg
rM
lp
Z
glg
i
tp
71
19
G
ilv
79
04
25
02
13
09
74
07
Y
glg
cA
76
19
72 73 74 75
19 19 19 19
se
cN
re
X
glg
bis
urA
m
3
pA
pa
48
10
PE
tP
be
79
07
cB
re
84
04
s
rr
85
04
E
PP
98
16
77
19
78
19
79
19
4
pt6
m
F
nrd
cC
f2
2
dD
f
rr
82
19
P
RE
84
19
RS
G
_P
PE
v
gc
t
tk
18
13
cA
ro
57
10
23
09
85
19
33
18
03
17
72
15
cy
cA
86
19
34
18
93
04
94
04
56
03
36
02
11
01
95
04
rA
pu
86
07
27
09
I
sig
20
13
90
11
87 88
19 19
89 90
19 19
36
18
E
PP
91
19
qI
lp
92
07
93
07
dB
lp
4
dA
3
dD
G
ctp
glc
07
17
RS
G
_P
PE
fa
fa
63 qV
10
lp
r
qo
87 88
15 15
93 94
19 19
95
19
41
18
96
19
56
14
F
ctp
42
18
13
17
57 458
1
14
27
13
93
15
dA
na
59
14
P
glg
pE
da
37
09
98
19
aB
gu
99
19
d
gn
72
10
dB
na
60
14
48
02
22 23
01 01
47
02
45 46
18 18
fa
01
bG
fa
20
47 A B
18 ure ure
20 21
17 17
03
20
C
ure
22
17
23
17
30
13
98
15
dC 597
1
na
61
14
75
10
05
12
3
dA
39
09
05
08
53
06
lJ lL
rp rp
2
pL
m
m
04
08
G
din
04
12
73
10
38
09
rL
pu
37
dD 650
0
2
pS
m
m
fa
03
12
00
20
46
02
21
01
69 70 71 72
03 03 03 03
45
02
sA
fu
rB
se
68
03
8
hA chA
e
tA
ps
01
12
ec
tC
ps
02
08
pC 801
pe
0
48
06
2
4
aA 050
cm
00
12
69
10
92
15
5
dE
7
dD
3
T
6
8
14
19
dB 71 717 71
17
1 1 1
17 pro
fa
B 90 91
bio 15 15
55
14
99
11
fa
fa
65 366 367
03
0 0
02
05
1
oS
ph
99
07
81
97 98 10
11 11
IS
B
glg
P
RE
tB
ps
98
07
64
03
fa
2
dA
c
ox
1
lE
ga
47
06
a
fb
y
ox
RS
G
_P
PE
53
14
k
09 10 11
17 17 17 cm
86
15
47
15
IS
tS
ps
E
PP
bG
97
07
G
lip
C
pro
gtE
m
fa
RS
G
_P
PE
38 39
RS
G
18 18
_P
PE
08
17
94
11
24
13
PE
RS
G
_P
PE
65 66
10 10
nD
pk
10
61
IS
95 96
07 07
4
3
2
1
aA
aA
aA
aA
m
m
m
m
m
m
m
m
cta
92
11
61
03
16
01
41
02
98 499
04
0
60
03
97
04
15
01
38 39 40
02 02 02
hA 14
gm 01
58 59
03 03
96
04
a
gc
2
2
1
oS
tC
tA
ps
ps
ph
91
07
21 22
13 13
91
11
77 78 79 80 81
15 15 15 15 15
E
PP
35
18
76
15
26
09
89 90
07 07
rQ
pu
RS
G
_P
PE
19
13
88
11
fa
14
dD
25
09
87
07
E G
rT etT 35 36 37 pT c s lK lA
th m 06 06 06 tr se nu rp rp
92
04
RS
G
_P
10
01
PE
35
02
3
gX
p
m
nra
34
06
re
08
01
D 71 73 74 75
bio 15 15 15 15
86
11
alk
02
17
F
bio
l
ta
t
og
84
07
3
nX
85
07
se
E
PP
2
bD
ga
3
33
hA 06
22
09
35
15
IS
21
09
83
07
ec
88 pm
04
g
32 rdB
02
n
I
ctp
54 55 X 56
10 10 leu 10
00 01
17 17
A
bio
zw
fa
53
10
52
10
20
09
4
dE
87
04
fa
54
15
IS
30 31
18 18
rG
py
66 67
15 15
rrl
84
11
44 vB cA
14 de op
51
10
T
arg
18 19
09 09
re
86
04
Bb trBa
rC
p
pu ptr
10
pL
m
m
65
15
97
16
06
01
29 230
02
0
aJ pR
dn hs
28
02
B
m
rp
49 50
10 10
43
14
78
07
83
04
E
grp
27
02
04
01
2
sA 23 24 25 vH 27 28 29
pg 18 18 18 gc 18 18 18
95
16
RS
G
_P
PE
14
13
rB
pu
cD
re
urB
m
aK
dn
E
PP
47
10
81
10
IS
57
15
IS
46
10
13
13
92 93 A
16 16 tly
G
ec
C 12
atp 13
s
pk
14
09
76
07
75
07
B
ctp
26
02
80 81
04 04
28
06
26 27
06 06
44 45
10 10
39
14
D
atp
43
10
rJ 91
lp 16
60 61
15 15
k
pg
24
02
02
01
47 348 349
0
03
0
25
06
22 23 24
06 06 06
E
IK
-L
IS
41 42
10 10
rS
A
ilv
p
ga
23
02
09 10 11 12
09 09 09 09
s
pk
A
atp
P2
aro
A1
ch
21
06
73
04
45
03
21
02
70 71 urD
07 07
p
6
pL 8
m 55
1
m
35
14
33 34
14 14
B dC D 56
frd fr frd 15
32
14
H
03 B E F
13 atp atp atp atp
xC 78
fd 11
33 34 35 36 37 38 PE
P
10 10 10 10 10 10
76
11
19
02
71 72
04 04
42
03
18
02
nrp
'
lK
14 15 17 lT lT
06 06 06 ga ga ga
2
dB
66
07
fa
41
03
W
lip
00
01
16
02
1
dD
40
03
fa
6
hA 906
ec
0
65
07
K
m 301
1
he
fa
32
10
cD
ac
eA
13
06
ac
3
dE
fa
97 98
00 00
4
dD
12
06
pC
kd
03
09
E A
m
rp prf
'
11 11
dD dD
fa
fa
28
14
o
rh
10 gtC
18 m
76 bF 678
16 ds
1
O
lip
rC
th
73
11
02
09
pB
kd
pA 00 01
om 09 09
pA
kd
98
08
10 611
0
06
65
04
13
02
E
PP
oR
ph
52 53 55 56 57
19 19 19 19 19
E
PP
75
16
48 49 50 51
19 19 19 19
E
PP
97
08
o
ph
70 171
11
1
25
14
rA
th
PE
pD
kd
A2
glt
rV 56
th 07
38
03
dR
na
05 06 07 08 09
06 06 06 06 06
36
15
IS
P
RE
94 095
0
00
62 463
04
0
93
00
A
ck
pC
as
ctp
59 60 61
04 04 04
36
03
03 qO
06 lp
PE
73 674
16
1
47
19
PE
23
14
sA
ly
pE
kd
E
PP
E
PP
95
08
67
11
RS
G
_P
PE
69 70 71
16 16 16
43
15
42 43 44
19 19 19
41
19
E
PP
68
16
rI
lp lbN
g
67
16
0
pA 54
1
ls
vrC
91 V
12 arg
65
11
57
04
32 33 lA
03 03 rm
10
02
58
04
91
00
09
02
98 99 00 01 rA
05 05 06 06 tc
31
03
07 08
02 02
88 89 090
0
00 00
24 25 26
10 10 10
97
05
93
08
o
en
sA
m
m
95 96
05 05
55 A2
04 ch
e
54
04
cE
hy
3
pL
m
m
29 30
03 03
92
08
qU
lp
94
05
E
PP
dE
fa
21
10
91
08
sB
m
m
27
03
cQ
hy
28
03
05
02
cP
hy
A2 bH 17 rH
19
rib
ri 14 lp
14
87
12
fd
m
din
13 14
14 14
rG
na
lp
92
05
4
pS
m
m
23
03
04
02
cD
hy
24 25 26
03 03 03
03
02
83
00
RS
G
_P
48 49 50
07 07 07
PE
91
05
4
pL
m
m
A
cit
C
rib
84 ysD
c
12
19
10
90
05
gA
ud
11
pL
m
m
79 80 81 82
00 00 00 00
20 d
03 dc
T
U
ln
glm g
2
utT
m
13
hA
ec
97
17
49
04
2
ce
m
RS
G
_P
PE
pB
op
35
15
G
rib
pC
59
11
op
34
15
e
rp
96
17
s8
pk
33
15
pD
87
08
45
07
qT rsA
lpp
p
rBB
fp
58
11
op
u
fm
57
11
48
04
86 87 588
0
05 05
43 44
07 07
pth rplY
85
08
32
15
31
15
t
fm
pA
56
11
6
s1
pk
op
ad
92 93 94
PE 17 17 17
03
14
04
14
79
12
55
11
rC
se
11 12
10 10
t
4
om 115
52
11
gA
ks
83
08
'
57
15
IS
gllp
pc
01
02
99 00
01 02
78
00
2
3
Q lyU 18
g 03
98
01
76 77
00 00
15 16
03 03
75
00
44 K 46
A1
04 sig 04
ufa
85
05
43
04
14
03
97
01
74
00
RS
PG
39 40 41 _
07 07 07 PE
E
PP
12
03
13
03
73
00
95 96
01 01
'
71 P 72
00 RE 00
80 81 82
08 08 08
10
03
08 09
03 03
gly
A2
'
L 36
k ap
ad m sig 07
E
PP PE
76
12
A3
79
05
oa
07
03
aA
sd
06
03
etS
m
c
se
rB rC
lp lp
46
11
75
08
30
07
93
01
68
00
sA psd
ps
67
00
06
10
E
PP
92
01
RS
G
_P
PE
35
04
RS
G
_P
PE
84
17
d2
91
01
bB
pa
F
R
G
arg arg arg
l2
bb
w
94
13
cr
m
fa
1
dE
lB
xy
34
04
90
01
65
00
77
05
72
12
28
07
76
05
03
10
cA
fu
68 69 rA 71
12 12 lp 12
23
15
D
ilv
C
30 31 d
33
04 04 so
04
E
PP
87 88
01 01
11
10
hA chA
ec
e
40
11
arc
etK
m
arg
99 00
09 10
cs
25
07
pB
de
74
05
2
D2 A 870
oa oa 0
m m
V
96
97 98
09 ala 09 09
64
12
34
11
J
rim
glS
63
00
28
04
p
sp
73
05
26 thA
04
x
72
05
67
08
C2
E2
63 oa og oa
08 m m m
lA
ce
84 85
01 01
60 61
00 00
RS
G
_P
98 99 00 01 02 03
02 02 03 03 03 03
PE
81
01
59
00
D
lN lX lE sN sH lF lR sE m lO
rp rp rp rp rp rp rp rp rp rp
69
05
80
01
A
oe
m
13
07
68
05
lU
ga
ats
rO
lp
aB
dn
arB
60
12
74
17
48
16
16
15
59
12
32
11
12
07
61
08
90 91 92
09 09 09
fa
dB
ats
66
05
iC
th
95
02
rT 567
0
ty
5
rA 910 pC
dB
1 lp
fa
fu
71
17
PE
14
15
82 arA
13
c
30
11
88
09
fa
dA
93
02
94
02
75 76 77 78
01 01 01 01
52 sF b sR lI 57
00 rp ss rp rp 00
20 21 iD
04 04 th
92
02
74
01
qM
lp
iA 513
ep
1
58
08
87
09
56 57
08 08
I
fC m lT nR
in rp rp ts
54
12
28
11
P
RE
RS
G
_P
PE
aD
de
dK
pp
sc
86
09
54 r
08 fa
B2
oa
m
66 67
17 17
nT
na
pd
C1
grc
lp
qL
rK
lp
nA
po
C
sJ lC lD lW lB sS lV sC lP m sQ
rp rp rp rp rp rp rp rp rp rp rp
61
05
15 16 iG
04 04 th
90
02
72
01
49
00
87 88 89
02 02 02
71
01
47 048
00
0
59 60
05 05
76
13
rE
lp
iE
th
iE
ub
83
09
26
11
'
B9
IS
08
15
75
13
25
11
fa
1
dD
E
PP
70
01
46
00
3
utT
m
PE
98 99
06 06
57
05
12
04
82
09
65
17
39
16
74
13
51
12
63 64
17 17
hC
ep
81
09
07
15
73
13
bp
03 04 05 06
15 15 15 15
71
13
zw
56
05
'
06 51
16 08
IS
49 50
08 08
97
06
D
en
m
H
gln
84
02
1
ce
m
44 045
0
00
67 168
01
0
42 043
00
0
96
06
3
sM
cy
95
06
79
RS
09
G
_P
PE
lp
qS
96 97
18 18
02
15
37
16
36
16
01
15
69 70
13 13
10
61
IS
49
12
18 19 20
11 11 11
RS
G
_P
PE
46
08
nG
pk
fa
5
dD
C
oC
en
m
bp
kA
ac
D1
lld
52
05
uS
le
83
02
65
01
63 64
01 01
qE
92
06 pq
dD
fa
07
04
81
02
61
01
39 40
00 00
38
00
12 13 14 15 16 17
11 11 11 11 11 11
76
09
42
08
L
lip
rB
uv
06
04
E
PP
37
00
B 49 50
en 05 05
m
89
06
88
06
PE
36
00
45 46 47
12 12 12
65
13
qZ
lp
87
18
32
16
13
11
11
dE
fa
94 95 96
14 14 14
bU
rs
87
06
46 47
05 05
40 841
08
0
PE
s6
pk
34
dD
fa
RS
G
_P
PE
58
01
33 34
00 00
86
06
cD
ac
39
08
09
11
qR
f
tu
dh 41 42
m 12 12
06 eB eA
11 xs xs
utA
m
F2
E 3 4
A
en 54 54 pit
m 0 0
lp
po
fu
30
dD
fa
RS
G
_P
PE
sA
41
05
37
08
E
PP
05
11
rA
co
60
13
04
11
g
su
36
08
7
12
hA
dE
ec
fa
70
09
lp
80
18
59
13
gB
su
39 40
05 05
1
pS
m
m
77
02
81 sL sG
06 rp rp
38
05
1
pL
m
m
dD
fa
76
02
2
53
dE
01
fa
bio
A
B
tA tA ntB
pn pn
p
27 28 29 30 31
00 00 00 00 00
01 02 03
11 11 11
gA
su
ctp
27 628
16
1
A3
gln
1
dD
fa
26
16
89
14
58
13
lp
99
10
00
11
67 68
09 09
87 488
14
1
m
fu
33
12
24
16
Z
m
he
57
13
32
12
96
10
62
09
31
12
2
oH
ph
60 61
09 09
RS
G
_P
PE
79 80
06 06
78
06
37
05
dE
fa
01
04
74
02
26
00
PE
72 273
02
0
5
pS
m
m
2
lE
ga
PE
23 24 25
00 00 00
qK
lp
63 64 65 66
09 09 09 09
RS
RS
G
G
T pT eT _P
_P
PE
gluas ph PE
5
pL
m
m
pn
98
03
fa
50
01
49
01
95 96 97
03 03 03
fa
D2
48
01
21 022
0
00
uT
le
A4 74 hA5
06 ec
h
ec
30
12
nF
pk
82
14
Y
oe
m
dD
cy
81
14
rp
m
sA
de
59
09
94
03
RS
G
_P
PE
31 sT
08 ly
31
05
93
03
20
00
47
01
69
02
19
00
68
02
46
01
29 30
08 08
44 45
17 17
dC
cy
qX
lp
gly
54
13
27
12
58
09
R
80
ox
m
14
53
13
lp
92
03
30
05
qP
pp
rU
na
45
01
'
27 28 05
08 08 16
IS
26
08
d
en
sB
cc
90 etZ
m
03
aA
co
rH
pu
25
08
28
05
rT
pu
66
02
43
01
pb
pA
2
bG 51 52
fa 13 13
25
12
24
12
RS
G
_P
PE
rN
pu
1
sA
de
69
06
25 26 sA
05 05 cc
76
14
15 16
16 16
22
12
90
PE PE 10
53
09
23
05
L
m
he
86
03
62 63 64
B2
02 02 02 fec
37
01
nB
pk
38 39 40
01 01 01
12 bA
00 pa
36
01
10 11
00 00
iA
85
03
18 819
08
0
rp
oB
19
05
60
02
pp
F
ph
08
00
33
dD
85
10
16
12
44
13
84
10
rD
uv
18
05
clp
iX 17
th 08
64 65 66
06 06 06
17
05
57 58 59
02 02 02
32
01
33
01
07 T T
00 ile ala
F I2
B H A
pA
his his his im his his
24 25
17 17
his
64 465 466
14
1 1
F G D
ure ure ure
D
his
62 63
14 14
E
PP
16
05
47 48
09 09
82 83
10 10
glg
A 81
gre 10
pg
dE
fa
13 C2 A2
08 se ys
s c
ats
15
05
C
ab
B 3 4
m
1 1
he 05 05
10 11
08 08
rM
pu
b
co
2
pC
fb
30
01
rA
gy
80 381 pA 383
0 um
03
0
77 378 ec
03
0 s
co
D
nir
28
01
rB
gy
56 57 58 59 60 61 62
06 06 06 06 06 06 06
s
cy
76
03
B
nir
27
01
cF 004
re
0
2
08 09 gA 11
lP
fo
12 12 ta 12
41
09
07
08
55
06
C
m
he
74 75
03 03
26
01
aN
dD
fa
U
lip
40
09
s
cp
54
06
A
m
he
73
03
08
05
dn
49 50 sp
02 02 h
RS
G
_P
pA
pe
PE
dn
2,250,000
2,100,000
1,950,000
1,800,000
1,650,000
1,500,000
1,350,000
1,200,000
1,050,000
900,000
750,000
600,000
450,000
300,000
150,000
4,350,001
4,200,001
4,050,001
3,900,001
3,750,001
3,600,001
3,450,001
3,300,001
3,150,001
3,000,001
2,850,001
2,700,001
2,550,001
2,400,001
2,250,001
lE
fo
PE
rA
ty
26
32
hD
ad
30
25
07
24
82
22
E
74 at6
PP 38 es
08
20
28
32
85
34
76
38
pro
36
dE
77
38
fa
60
37
F
lip
86
34
85
22
11
24
86
22
fts
11
20
09 10
20 20
90
30
91
30
79
38
hA
92 93
34 34
80
38
64
37
81
38
65
37
19 20 PE
P
36 36
qH
lp
18
36
91
34
PE
lp
82
38
26
36
96
34
96
30
83
38
PE
27
36
97
34
39
32
84
38
96 97
22 22
urF
m
22
24
98
22
urE
m
24
20
gK
24
24
ald
05 06
27 27
04
27
pc
59
28
09
27
29
36
4
ce
m
85
38
73
37
C2
his
pp
98
34
86
38
02 xD
35 fd
88
38
fa
26
dE
27
35
36
78
37
fa
dE
89 90 91 PE
P
38 38 38
77
37
2
lB
rm
rU
se
76
37
31 32 33
36 36 36
87
38
21
hA
E
ec
lip
30
36
qB
lp
61 62 63 64
33 33 33 33
prf
rA
fa
17
65
33
PE
79
37
34
15
IS
39
36
sp
oU
pE
gc
40
36
94
38
68
33
43
36
95
38
iB
ep
54
32
87
37
79
29
pA
99
38
hfl
66
25
dA
02
39
91
37
03
39
92
37
10
35
77
33
27
31
nA
51
36
78
33
69
25
84
28
31
27
08
39
09
39
bA
em
54 55 56 57 58
36 36 36 36 36
id
B
trb
lD
rm
31
31
lS
ts
sB
rp
60
36
10
39
bB
em
93
29
91
28
19
62
36
71
32
pC
dp
98
37
lM
cw
57
15
IS
B2 C
trx trx
pD
dp
19
hA 517
3
ec
55
24
his
16
39
cD
ac
pB
dp
rA
pa
pA
dp
19
35
73
32
rC
qc
50
20
lA
re
clp
rB
pa
20
35
00
30
0
gid 392
hF
fd
ap
58
24
E
PP
68
36
23
35
94
33
oe
cF
49 50 751
27 27
2
se
P2 lpP
clp c
37
23
25
35
aA
71
36
oE
nu
02
38
72
36
sA
id
pC
fb
73 nth
36
28
35
yA
ph
pA
da
PE
pA
fb
31
35
05
38
pfk
12
29
c
da
bH
co
glb
dg
80
36
E
PP
06 07
38 38
79
36
01
34
tC
ga
13
30
91
32
08
38
4
hiB
w
E
PP
02
34
t
la
'
nA
glf
po
34
35
03
34
oL
nu
G
pir
83
36
36
35
bK
plc
12
22
bM
co
76
24
plc
p
pe
bL
co
ffh
95
32
oN
nu
17
30
cs
V
84
36 pro
37
35
RS
G
_P
PE
85
36
ufa
A2
D
gln
E
PP
B
gln
23
30
81
10
IS
t
am
89
36
1 2
2
ltp 354 354
aB
gu
17
38
90
36
29
dE
fa
28
18
38
91
36
dE
fa
19
38
sm
78 79
27 27
E
PP
93
36
94
36
21
38
fa
95
36
glp
49
35
sI
gp
dA
hp
alr
8
pL
m
m
97
36
pA
pa
99
36
53
35
E
PP
iA
am
74
31
00
37
01 702
3
37
55
35
28
34
36
30
s2
pk
03
37
up
dA
fa
B
m
pm
hA
gs
57
35
23
26
05
37
E
PP
91
27
92
27
27
26
fa
dD
23
30
dE
fa
dB
ga
as
as
29
38
cB
32
38
31
38
aQ
dn
fa
33
dE
S
glm
hA
sd
98
27
sC
pp
33
38
se
12
37
oA
nh
rS
37 38
34 34
B
sp
etU
m
69
35
35
38
36
38
14 cR 16
37 re 37
Q
ob
67 68
35 35
39 440 rsA
34
3
m
27
33
E
nrd
J
sig
94
31
37 eA
38 ph
18
37
17
37
70
35
71 572
35
3
sD
pp
29
33
95
31
Q
glp
43
38
X
aZ
dn
gI
su
97
31
22
37
75
35
13
28
48
34
c
ac
48
22
10
61
IS
B2
ars
dA 47
so 38
26
37
79
35
50
34
35
33
34
33
99
31
37 38
33 33
22
dE
B
C
drr drr
fa
cy
sS
15
21
54
34
d1
81 82 83
35 35 35
etC
m
19
28
B
lig
28
37
qE
lp
54
38
G
49 50 51 s en
38 38 38 hn m
27
37
11 T
25 his
K
lip
12
25
81
10
IS
E2
trp
13
25
54 55 56 257
22 22 22
2
52 53
22 22
87
23
58
22
tA
etA 342
m
3
02
32
cs
ra
dA
55
38
29
37
87 588
35
3
56 57
38 38
86
35
ala
63
34
D
glt
C
lig
RS
G
_P
PE
30
37
pS
lp
ly
sU
A
nir
E
PP
Z
oe
m
07
32
33
37
B
glt
32
37
91
35
34
37
B2
ilv
37
37
62
38
63
38
E E
PP PP
oA
bp
72
34
C
clp
71
34
ats
ly
fa
sA
41 42
37 37
cy
nK
pk
s
pk
PE
17
32
43
37
68
38
sT
69
38
45 PE
37
44
37
21
32
S
vir
2
dD
83
30
80
34
22 H
32 sig
fa
51
37
70
38
71
38
49 50 rX 52 53
37 37 se 37 37
47 48
37 37
05 lK lX lP
36 fo fo fo
79
34
04
36
E
PP
20
32
81
30
s
dx
02
24
77
22
rD
py
40 sA 842
28 nu
2
01
24
76
22
pL
lp
81
26
bI
su
75
22
5
s1
pk
fB
in
1
hiB
w
03
36
E
PP
cy
74
22
15
hA 680
2
ec
sW
18
32
37 fA
28 rb
E
m
he
cy
tP
kg
46
33
40
37
r2
ls
10
61
IS
69
22
RS
G
_P
PE
68
22
pN 71 72 73
lp 22 22 22
33 34 35 36 37
21 21 21 21 21
32
21
Y'
m
he
sQ
F
din
cy
tD ntC 216
en
3
e
79
30
78
30
74 75
34 34
13
32
pX
lp
RS
G
_P
PE
76
30
67
22
2
sS
75 76
26 26
74
26
95
23
66
22
cy
pE gpA
u
ug
44
29
12
32
pB
ug
43
29
33
15
IS
lE
rh
75
30
60 61
38 38
35 736
37
3
RS
G
_P
PE
94
35
E
hp
m
RS
G
_P
PE
10
32
09
32
74
30
pC
ug
16
hA
ec
73
26
22 cpS
25
a
72
26
20
25
p
bc
tB
gg
65
22
29
21
sP 128
2
an
D
rib
7
pL
m
m
72 73
30 30
3
lB
rm
92 qF
35
lp
P
RE
70
26
64
22
sH 93
cy 23
PE
63
22
RS
G
_P
PE
25
21
28 29 30
28 28 28
08
32
69 70 71
30 30 30
fa
27
28
2
dD
26
28
lC 66 67
lB
rm rm 34 34
05
32
25
28
'
81
10
IS
A
m
pg
24
28
V 04
lip 32
utY
m
16 17
25 25
89 90
23 23
61 62
22 22
'
62 63 64 65 66 lpX 668 669
2 2
c
26 26 26 26 26
rE 66 67
em 30 30
23
28
64
30
as
m
20 21 22
28 28 28
etH
m
2
hE 260
2
ad
N
m
he
E
PP
14 515
25
2
G isI
20
21 his h
19
21
A
A lQ
D K M J fA
tru rp rpo rps rps rps rpm in
5
pA
pa
18
28
01
32
ic
10
25
btA
m
51
22
18
21
7
pK 1
lp 21
09
25
50
22
51 52 53
34 34 34
S
trp
50
26
16 17
28 28
08
25
btB
m
00
32
48
38
60
30
A
drr
10
61
IS
14
21
D1
glp
13
21
06 507
2
25
D6
12
21
14 15
28 28
24 25
37 37
44 45
38 38
rV 23
se 37
49
34
33
33
2
rD
uv
59
30
sE
pp
77
35
gA
na
nM
pk
3
dD
btC
m
sB
fa
ka
B 11
1
prc 2
T T
5 6
7 8 9
glycys alU264 264 64 64 264
v
2 2
oB coA
s
44 lT
26 va
sc
btD
m
57 058
3
30
12
28
04
16
IS
74
35
30
33
96
31
ars
cD
55 inP
30
d
47
34
10
28
prc
M
A
bD cp as
k
a
E
PE PP
ac
fa
41 42
26 26
'
55
08 09 15 11
28 28 IS 28
34
dE
20
37
fa
39 40 B
38 38 bfr
19
37
cA
39 40
26 26
ac
42
22
10
61
IS
05 06
21 21
I H
54
nrdnrd 30
05 06 07
28 28 28
6
sI lM 44 45
rp rp 34 34 344
47
15
IS
93
31
50
30
btE
m
9
E 9
cit 249 dE1
fa
e
ac
03 04
21 21
02
21
A
35 36 ed 638
d
26 26
2
02 03 04
28 28 28
25 26
33 33
49
30
01
28
92
31
hA
pd
RS
G
_P
PE
hB
pd
20 21 E3 hA C3
10
33 33 oa gp oa
61
IS
m
m
hB
sd
91
31
03
16
IS
00
28
32 33
26 26
G
46 47
30 30 nrd
C
dh
97
27
99
27
hC
pd
lZ
he
lV pE 39 40
va ah 22 22
37
22
btF
m
00
21
bD
co
91 92 93 94
24 24 24 24
29 630 631
2
26
2
90
31
fe
30
38
uA
le
fa
32
dE
hC hD
sd sd
31
dE
fa
pV
lp
28
26
PE
32 33 A 35
22 22 ptp 22
RS
G
_P
PE
76 tH tG
23 mb mb
88 89
31 31
34 35
34 34
28
38
37
15
IS
fa
dD
d
cd
10
61
IS
oA
de
33
34
27
38
ad
10
61
IS
cta
82 83 84 85 186 187
3 3
31 31 31 31
2
rB
se
sB
pp
B 94
95
tru 27
27
25 626
26
2
97
20
75
23
bC
co
96
20
hrc
30
22
RS
G
_P
PE
J2
72
23 dna
29
22
93 94 95
20 20 20
28
22
PE
02
16
IS
24
26
89
24
06 07
37 37
59
35
31
34
52
15
IS
12
33
80 81
31 31
11
33
79
31
30
34
1
ltp
22
26
88
24
40
E
15
PP IS
10
33
pB
rn
27
22
38
17 040 041
30
3
3
hA
ec
s
pp
21
dE
fa
37
30
R
sir
26
22
lY
he
67 oH 69 370
23 ph 23
2
19 20 21
26 26 26
18
26
77 78
31 31
35
30
87
27
17
26
16
26
91
20
n
pa
90
20
5
6
x
be 236 236
24
22
pE
pe
RS
G
_P
PE
32
15
IS
oD
S
lip
de
27
34
xB
fd
E
PP
iB
am
75
31
34
30
26
dD
fa
pU sO ribF
lp rp
33
30
2
iA
am
nJ
pk
14
W hA
arg ec
23
22
RS
G
_P
PE
Q
lip
sA 29
te 29
32
30
24
34
98
36
20
hA 551 552
3
3
ec
04
33
72 73
31 31
31
30
rS
th
84
24
60 61 362
23 23
2
A2
gln
56
15
IS
84 85 86 87
20 20 20 20
58 rB
23 fu
83
20
c 6
7
rn 292 292
lp
30
30
70
31
fp
pR
pe
11 sA 13
26 pg 26
83
24
E
gln
gly
82
20
I 1 2
rim 342 342
D2
glp
gc
48
35
22
38
47
35
ES
gro
5
dA
00 Y1
33 ho
p
EL
gro
3
hiB
w
67
31
fix
23
29
81
27
10
26
69
31
B
fix
ald
B2
pls
E
PP
A1
gln
81
20
pJ
lp
09
26
68
31
26 027
30
3
2
pA
pa
R2
ox
m
45
35
ats
5
6
R3 16 16
3
3
ox
m
25
30
77
27
xH
pd
81
24
10
61
IS
10
61
IIS
E
PP
19
22
79
20
54 55
23 23
78 479 480
24
2 2
D 15
13
34 sig 34
qC
lp
12
34
i
ne
24
30
Y
fts
78
20
16 pB pA
li
li
22
76
27
75
27
06
26
77
24
E
PP
76 77
20 20
c
su
71 72 pB 74
27 27 da 27
61 162 163
31
3
3
86 87 88
36 36 36
PE
60
31
aB
gu
r
lh
75
20
03 604 B2
2 tes
26
02
26
74
20
plc
hD
ep
73
20
19 PE PE PE
P P
30
PE
E
PP
oD
ch
E
PP
17
29
E
PP
06 07 08
34 34 34
94
32
lp
qA
5 67
bG 27
95 96 97 98 99 00 peE
25 25 25 25 25 26
s
fa
oM
nu
15
30
co
sig
74 75
24 24
72 73
24 24
65
27
04 05
34 34
ald
15
29
35
35
A
lig
92
32
nI
pk
gc
vT
bla
46 47 48
23 23 23
E
ilv
67
20
45
23
71
24
09
22
bI
co
vB vA vC
ru
ru ru
RS
G
_P
PE
68 469
2
24
aG
dn
tA
13
29
ga
F W 88 89
sig rsb 32 32
77 78
36 36
30
35
75 76
36 36
29
35
00
34
oG
nu
tB
ga
cA
ac
08
30
99
33
oF
nu
07
30
dD
fa
pe
pD
pQ 42
lp 23
05
22
bG
co
S
06 obT ob
c
22
c
63
20
M
S'
57 58 59 60 61 62 A yA
54
27 hsd hsd 27 27 27 27 27 27 dfr th
bT
ga
nT
as
04
22
03
22
bN
co
i
64 rp 466
24
2
hK
cb
D M 08 sP 10
trm rim 29 rp 29
pZ
lp
lp
88
25
81 82 seA 284
32 32
s 3
oD
nu
05
30
26 527
35
3
gu
4,411,529
32
dD
fa
69 hE
36 ep
24
35
95
33
cD
ac
oA oB oC
nu nu nu
B
ilv
04
30
pW
52
27
cD
se
9
pL
m
m
n
as
61
20
0
06
59 2
20
V P
gly lip
tig roU
p
2 2 G 2
sR sN m mB
rp rp rp rp
54
20
01 hB
pB lS
29 rn
le rp
85
25
59
24
78 irA
32 b
77
32
43
31
53
20
97 S3 199 taC
21 mp
2
c
m
52
20
36
23
fts
21 22 pA H
39 39 rn rpm
22
35
ac
3
s1
pk
21
35
nH
iu
rE urK
pu
p
E
ys
rB
qc
N
C
ilv ilv
47
27
42
31
B4
fa
pY
lp
aA
cm
25
dE
fa
K
ys
rA
qc
51
20
98 hD
28
fd
3
E2
fa
98
29
97
28
rA
ac
4
E2
fa
97
29
96
28
iB
pp
56
24
33
23
cta
49
20
43
ag 45 A3
27 d_ 27 gs
k
p
35
81
25
42
27
qD
lp
18
35
pfl
D
trp
ez
m
RS
G
_P
PE
B
viu
89
33
72
32
37
31
se
rA
B
lin
rC
xe
40
27
91
21
31
23
54
24
pP
lp
RS
G
_P
PE
35
dE
fa
ctp
93
28
uB
le
E
PP
94
29
E
PP
dD
fa
78
25
38 39
27 27
M 12
sig 39
ats
61
36
60
15
IS
51
24
1
rK
na
90
21
50 452 453
24
2 2
77
25
E
PP
PE
89
21
86 87
33 33
68 69
32 32
33 34
31 31
84 85
33 33
RS
G
_P
PE
sB
67
32
32
31
U U ltS
glu gln g
91
29
49
24
76
25
cA
re
88
21
2
s1
pk
27
23
73 74 75
25 25 25
va
26
23
15
dD
fa
35 cX
27 re
iC
am
34
27
90
29
87
28
89
29
bb
w
10
61
IS
lC
pS
as
fo
80 81 tB
ly
33 33
1
dD
fa
RS RS
G G
_P _P
PE PE
RS
G
_P
PE
30
31
2
lA
rm
29
31
86
28
39
15
IS
33
27
71
25
uC
le
85
28
32
27
63
32
79
33
28
31
kA 446
nd 2
70
25
25
23
85 86
21 21
24
23
82 83 84
21 21 21
2 1 23
cD cD 23
ro ro
81
21
pB leuD
hu
1
utT
m
rH
30
27
e
rn
cE
ro
68
25
py
62
32
pc
bC
em
PE
frr
26
31
RS
G
_P
PE
76
33
61
32
E
PP
04 05 06
39 39 39
49
36
2
58
32 whiB
59
32
pk
19
23
29
27
tA
dc
67
25
47
20
79 80
21 21
pI
lp
C
sp
aro
8
A
ia 272
m
24
31
83
29
'
18 18
hA hA iD
ec ec am
47 pA
36 cs
ilv
E
sp
A
m lU
rp rp
pF
22 23
31 31
gp
A
m
pm
21
31
lA
dd
da
ob
A
sp
77
21
58
15
IS
T
lip
3 9 0
sA
erT t5 7 8
m mp 28 28
cd
0 6
pt7 7
m 28
B2
ots
90
37
00 01
39 39
88 89
37 37
to
71
33
A
an
m
56
32
80
29
20
dE
3
E
sA eC oa 20
cy ss m 31
78
29
RS
G
_P
PE
45
36
2
aE
86
37
dn
53
32
B
oe
m
iL
th
38
15
IS
74
28
pro
nL
pk
42 cA 44
20 pn 20
15
23
75
21
41
20
74
21
14
23
fa
38
24
65
25
22 23
27 27
37
24
3
71 72 pt8
28 28 m
21
27
gln
sK
rb
13
23
sA
id
39 040
20
2
10 11 12
23 23 23
72
21
38
20
96 897 898
38
3 3
85
37
83
37
53
15
IS
RS
G
_P
PE
69
33
49 bB bA 52
32 ru ru 32
RS
G
_P
PE
15
31
g
un
70
28
81
10
IS
74 975
29
2
69
28
xA
le
61 62 63
25 25 25
35
24
B C D
oa oa oa 13 14
m m m 31 31
hH
sa
aA
cG
re
80 81 bE
37 37
rf
36 637 638
3 3
36
dD
08
31
72
29
67
28
60
25
9
etV 0
m 23
08
23
67 68 69
21 21 21
37
20
70 pM
21 lp
34 35 36
20 20 20
10
61
IS
33
20
17 18 19
27 27 27
16
27
59
25
15
27
65 66
28 28
14
27
57 58
25 25
E E 32 33
34
PP P 24 24
24
k
trA tm
m
71
29
64
28
07
31
trB
m
56
25
32
20
65 166
2
21
p
hs
06 07
23 23
64
21
30
20
pC pD
ah ah
05
23
13
27
pB
pb
pfk
63
28
N
lip
12
27
fp
ala
pro
62
28
eR
id
68 69
29 29
ap
54
25
26
24
28
20
03 04
23 23
01 02
23 23
60
33
42 43
32 32
57 58 59
33 33 33
41
32
00 501
35
3
51 52 53 55 lD
33 33 33 33 fo
54
33
cA
se
27
20
RS
G
_P
PE
sig
53
25
25
24
00
23
A4
gln
08
27
07
27
49 50 51 E
25 25 25 aro
tB 66
kd 29
57
28
sig
23
24
58
15
IS
G
htp
61
21
26
20
59 60
21 21
25
20
rU
pu
T
nic
pp
lp
70 771rgU rT
37
3 a se
68 69
37 37
J
es
m
rN
lp
t
hp
67
37
qG
66
37
94
34
fB 237 238
ke
3
3
94
30
95
30
62
29
63
29
rA
go
61
29
95
22
urX
m
18 19 20 21
24 24 24 24
hB
su
00
27
42
25
17
24
99
26
98
26
41
25
16
24
94
22
urD
m
20 21 22 23
20 20 20 20
18 19
20 20
RS
G
_P
54
PE
28
59 60
29 29
93
30
aro
15
24
92 93
22 22
fts
16 017
20
2
t
96 du
26
aro
95
26
aro
14
24
35
32
E
PP
92
30
58
29
52
28
93 94
26 26
31 dS 33 34
32 pv 32 32
ep
62
37
51
28
56 957
29
2
50
28
A
88 89
34 34 ots
30
32
55
29
bA
co
90
26
aro
36
25
urG
m
15
20
B
88 h
pO e
22 cd lp ss
urC
m
13
24
sT
yjc
07
16
IS
13 14
20 20
A B
trk trk
rp
Q
fts
12
20
Q
32 sB fp
25 nu e pep
10
24
78
38
13
dD
fa
54
29
co
sA
de
53
29
bB
09
24
89
26
ad
PE
83 M
22 lip
12 13 14 15 616
36 36 36 36
3
11
36
sA
cp
08 49
16 33
IS
88
30
52
29
sG
cy
'
61
48 15
33 IS
aro
87
30
51
29
efp
86 87 688
26 26
2
29
25
06
24
05
24
fd
xA
B
pit
43
21
ots
55 Z W roV
37 pro pro
p
fts
ars
82 83
34 34
25
32
85
30
rr
m
pro
29
dD
fa
81
34
24
32
R
lip
49
29
ars
43 44
28 28
83
26
25
25
pA
26 27
25 25
pR
lp
le
80
22
10
61
IS
42
21
uU
le
78 79
22 22
05
20
2
40
pE
21
da
04
20
4,350,000
4,200,000
4,050,000
3,900,000
3,750,000
3,600,000
3,450,000
3,300,000
3,150,000
3,000,000
2,850,000
2,700,000
2,550,000
2,400,000
Rv0823c
Rv0827c
Rv0890c
Rv0891c
Rv0894
Rv1019
Rv1049
Rv1129c
Rv1151c
Rv1152
Rv1167c
Rv1219c
Rv1255c
Rv1332
Rv1353c
Rv1358
Rv1359
Rv1395
Rv1404
Rv1423
Rv1460
Rv1474c
Rv1534
Rv1556
Rv1674c
Rv1675c
Rv1719
Rv1773c
Rv1776c
Rv1816
Rv1846c
Rv1931c
Rv1956
Rv1963c
Rv1985c
Rv1990c
Rv1994c
Rv2017
Rv2021c
Rv2034
Rv2175c
Rv2250c
Rv2258c
Rv2282c
Rv2308
Rv2324
Rv2358
Rv2488c
Rv2506
Rv2621c
Rv2640c
Rv2642
Rv2669
Rv2745c
Rv2779c
Rv2887
Rv2912c
Rv2989
Rv3050c
Rv3055
Rv3058c
Rv3060c
Rv3066
Rv3095
Rv3124
family)
transcriptional regulator
(NifR3/Smm1 family)
transcriptional regulator (ArsR
family)
transcriptional regulator
(LuxR/UhpA family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator (MarR
family)
transcriptional regulator
(PbsX/Xre family)
putative transcriptional regulator
transcriptional regulator (GntR
family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator
(LuxR/UhpA family)
putative transcriptional regulator
transcriptional regulator
(AraC/XylS family)
transcriptional regulator (MarR
family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator
(TetR/AcrR family)
putative transcriptional regulator
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator (IclR
family)
transcriptional regulator (IclR
family)
putative transcriptional regulator
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(AraC/XylS family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator (LysR
family)
putative transcriptional regulator
transcriptional regulator (MerR
family)
putative transcriptional regulator
(PbsX/Xre family)
putative transcriptional regulator
transcriptional regulator (ArsR
family)
putative transcriptional regulator
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator (LysR
family)
putative transcriptional regulator
transcriptional regulator
(Lrp/AsnC family)
transcriptional regulator (ArsR
family)
transcriptional regulator
(LuxR/UhpA family)
transcriptional regulator
(TetR/AcrR family)
putative transcriptional regulator
transcriptional regulator (ArsR
family)
transcriptional regulator (ArsR
family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(Lrp/AsnC family)
transcriptional regulator (MarR
family)
transcriptional regulator
(TetR/AcrR family)
transcriptional regulator (IclR
family)
putative transcriptional regulator
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator (GntR
family)
putative transcriptional regulator
putative transcriptional regulator
transcriptional regulator
(AfsR/DndI/RedD family)
Rv3160c
Rv3167c
Rv3173c
Rv3183
Rv3208
Rv3249c
Rv3291c
Rv3295
Rv3334
Rv3405c
Rv3522
Rv3557c
Rv3574
Rv3575c
Rv3583c
Rv3676
Rv3678c
Rv3736
Rv3744
Rv3830c
Rv3833
Rv3840
Rv3855
Rv0018c
ppp
Rv2234
ptpA
Rv0153c
truncated
putative phosphoprotein phosphatase
low molecular weight protein-tyrosine-phosphatase
putative protein-tyrosine-phosphatase
Rv1650
Rv2845c
Rv3834c
Rv2614c
Rv2906c
Rv3336c
Rv1689
Rv2448c
pheT
proS
serS
thrS
trmD
trpS
tyrS
valS
4. Nucleoproteins
Rv1407
fmu
Rv3852
hns
Rv2986c hupB
Rv1388
mIHF
phenylalanyl-tRNA synthase
subunit
prolyl-tRNA synthase
seryl-tRNA synthase
threonyl-tRNA synthase
tRNA (guanine-N1)-methyltransferase
tryptophanyl tRNA synthase
tyrosyl-tRNA synthase
valyl-tRNA synthase
Rv2090
Rv2191
Rv2464c
Rv3201c
Rv3202c
Rv3263
Rv3644c
6. Protein
Rv0429c
Rv2534c
Rv2882c
Rv0684
Rv0120c
Rv1080c
Rv3462c
Rv2839c
Rv1641
Rv0009
Rv2582
Rv1299
Rv3105c
Rv2889c
Rv0685
translation
def
efp
frr
fusA
fusA2
greA
infA
infB
infC
ppiA
ppiB
prfA
prfB
tsf
tuf
2. DNA
Rv0670
Rv1108c
Rv1107c
Rv2460c
clpP2
and modification
polypeptide deformylase
elongation factor P
ribosome recycling factor
elongation factor G
elongation factor G
transcription elongation factor G
initiation factor IF-1
initiation factor IF-2
initiation factor IF-3
peptidyl-prolyl cis-trans isomerase
peptidyl-prolyl cis-trans isomerase
peptide chain release factor 1
peptide chain release factor 2
elongation factor EF-Ts
elongation factor EF-Tu
Rv2457c
clpX
B. Degradation of macromolecules
1. RNA
Rv1014c pth
peptidyl-tRNA hydrolase
Rv2925c rnc
RNAse III
Rv2444c rne
similar at C-term to ribonuclease E
Rv2902c rnhB
ribonuclease HII
Rv3923c rnpA
ribonuclease P protein component
Rv1340
rphA
ribonuclease PH
end
xseA
xseB
endonuclease IV (apurinase)
exonuclease VII large subunit
exonuclease VII small subunit
3. Proteins, peptides
Rv3305c amiA
Rv3306c amiB
Rv3596c clpC
Rv2461c clpP
Rv2667
clpX'
Rv3419c
Rv2725c
Rv1223
Rv2861c
Rv0734
gcp
hflX
htrA
map
map'
Rv0319
Rv0125
Rv2213
Rv0800
Rv2467
Rv2089c
Rv2535c
Rv2782c
pcp
pepA
pepB
pepC
pepD
pepE
pepQ
pepR
Rv2109c
Rv2110c
Rv0782
Rv0781
Rv0724
prcA
prcB
ptrBa
ptrBb
sppA
Rv0198c
Rv0457c
Rv0840c
Rv0983
Rv1977
Rv3668c
Rv3671c
Rv3883c
Rv3886c
4. Polysaccharides,
lipids
Rv0062
celA
Rv3915
cwlM
Rv0315
Rv1090
Rv1327c
Rv1333
Rv3463
Rv3717
and glycopeptides
probable aminohydrolase
probable aminohydrolase
ATP-dependent Clp protease
ATP-dependent Clp protease proteolytic subunit
ATP-dependent Clp protease proteolytic subunit
ATP-dependent Clp protease
ATP-binding subunit ClpX
similar to ClpC from M. leprae but
shorter
glycoprotease
GTP-binding protein
serine protease
methionine aminopeptidase
probable methionine aminopeptidase
pyrrolidone-carboxylate peptidase
probable serine protease
aminopeptidase A/I
aminopeptidase I
probable aminopeptidase
cytoplasmic peptidase
cytoplasmic peptidase
protease/peptidase, M16 family
(insulinase)
proteasome -type subunit 1
proteasome -type subunit 2
protease II, subunit
protease II, subunit
protease IV, signal peptide peptidase
probable zinc metalloprotease
probable peptidase
probable proline iminopeptidase
probable serine protease
probable zinc metallopeptidase
probable alkaline serine protease
probable serine protease
probable secreted protease
protease
Rv2715
Rv3530c
Rv3534c
Rv3536c
lase
2-hydroxymuconic semialdehyde
hydrolase
probable cis-diol dehydrogenase
4-hydroxy-2-oxovalerate aldolase
aromatic hydrocarbon degradation
C. Cell envelope
1. Lipoproteins (lppA-lpr0) 65
2. Surface polysaccharides, lipopolysaccharides, proteins and antigens
Rv0806c cpsY
probable UDP-glucose-4epimerase
Rv3811
csp
secreted protein
Rv1677
dsbF
highly similar to C-term Mpt53
Rv3794
embA
involved in arabinogalactan synthesis
Rv3795
embB
involved in arabinogalactan synthesis
Rv3793
embC
involved in arabinogalactan synthesis
Rv3875
esat6
early secretory antigen target
Rv0112
gca
probable GDP-mannose dehydratase
Rv0113
gmhA
phosphoheptose isomerase
Rv2965c kdtB
lipopolysaccharide core biosynthesis protein
Rv2878c mpt53
secreted protein Mpt53
Rv1980c mpt64
secreted immunogenic protein
Mpb64/Mpt64
Rv2875
mpt70
major secreted immunogenic protein Mpt70 precursor
Rv2873
mpt83
surface lipoprotein Mpt83
Rv0899
ompA
member of OmpA family
Rv3810
pirG
cell surface protein precursor (Erp
protein)
Rv3782
rfbE
similar to rhamnosyl transferase
Rv1302
rfe
undecaprenyl-phosphate -Nacetylglucosaminyltransferase
Rv2145c wag31
antigen 84 (aka wag31)
Rv0431
tuberculin related peptide (AT103)
Rv0954
cell envelope antigen
Rv1514c involved in polysaccharide synthesis
Rv1518
involved in exopolysaccharide
synthesis
Rv1758
partial cutinase
Rv1910c probable secreted protein
Rv1919c weak similarity to pollen antigens
Rv1984c probable secreted protein
Rv1987
probable secreted protein
Rv2223c probable exported protease
Rv2224c probable exported protease
Rv2301
probable cutinase
Rv2345
precursor of probable membrane
protein
Rv2672
putative exported protease
Rv3019c similar to Esat6
Rv3036c probable secreted protein
Rv3449
probable precursor of serine protease
Rv3451
probable cutinase
Rv3452
probable cutinase precursor
Rv3724
probable cutinase precursor
3. Murein
Rv2911
Rv2981c
Rv3809c
Rv1018c
Rv3382c
Rv1110
Rv1315
Rv0482
Rv2152c
Rv2155c
Rv2158c
Rv2157c
Rv2153c
Rv1338
Rv2156c
Rv3332
Rv0016c
Rv2163c
Rv0050
Rv3682
Rv0017c
Rv0907
Rv1367c
Rv1730c
Rv1922
Rv2864c
Rv3330
Rv3627c
probable
probable
probable
probable
probable
probable
penicillin
penicillin
penicillin
penicillin
penicillin
penicillin
binding
binding
binding
binding
binding
binding
protein
protein
protein
protein
protein
protein
ctpA
ctpB
ctpC
ctpD
Rv0908
Rv1997
Rv1992c
Rv0425c
ctpE
ctpF
ctpG
ctpH
Rv0107c
ctpI
Rv0969
Rv3044
Rv0265c
trate
Rv1029
ctpV
fecB
fecB2
kdpA
Rv1030
kdpB
Rv1031
kdpC
Rv3236c
kefB
Rv2877c
merT
Rv1811
mgtC
Rv0362
mgtE
Rv2856
Rv0924c
nicT
nramp
Rv2691
trkA
Rv2692
trkB
Rv2287
Rv2723
yjcE
-
Rv3162c
Rv3237c
Rv3743c
potassium-transporting ATPase B
chain
potassium-transporting ATPase C
chain
probable glutathione-regulated
potassium-efflux protein
possible mercury resistance
transport system
probable magnesium transport
ATPase protein C
putative magnesium ion
transporter
probable nickel transport protein
transmembrane protein belonging
to Nramp family
probable potassium uptake protein
probable potassium uptake protein
probable Na+/H+ exchanger
probable membrane protein,
tellurium resistance
probable membrane protein
possible potassium channel
protein
probable cation-transporting
ATPase
arsA
arsB
arsB2
arsC
cysA
Rv2399c
cysT
Rv2398c
cysW
Rv1857
Rv1858
modA
modB
Rv1859
modC
Rv1860
modD
Rv2329c
Rv1737c
Rv0261c
Rv0267
narK1
narK2
narK3
narU
Rv0934
phoS1
Rv0928
phoS2
Rv0820
phoT
Rv3301c
phoY1
Rv0821c
phoY2
Rv0545c
pitA
Rv2281
Rv0930
pitB
pstA1
Rv0936
pstA2
Rv0933
pstB
Rv0935
pstC
Rv0929
pstC2
Rv0932c
pstS
Rv2400c
Rv0143c
Rv1707
Rv1739c
Rv3679
Rv3680
subI
-
drrB
Rv2938
drrC
Rv2846c
Rv3065
Rv0783c
Rv0849
Rv1145
Rv1146
Rv1250
Rv1258c
efpA
emrE
-
Rv1410c
Rv1634
Rv1819c
Rv2136c
Rv2209
Rv2333c
Rv2994
Rv1877
Rv2459
B. Chaperones/Heat shock
Rv0384c clpB
heat shock protein
Rv0352
dnaJ
acts with GrpE to stimulate DnaK
ATPase
Rv2373c dnaJ2
DnaJ homologue
Rv0350
dnaK
70 kD heat shock protein, chromosome replication
Rv3417c groEL1 60 kD chaperonin 1
Rv0440
groEL2 60 kD chaperonin 2
Rv3418c groES
10 kD chaperone
Rv0351
grpE
stimulates DnaK ATPase activity
Rv2374c hrcA
heat-inducible transcription
repressor
Rv0251c hsp
possible heat shock protein
Rv0353
hspR
heat shock regulator
Rv2031c hspX
14kD antigen, heat shock protein
Hsp20 family
Rv2299c htpG
heat shock protein Hsp90 family
Rv0563
htpX
probable (transmembrane) heat
shock protein
Rv2701c suhB
putative extragenic suppressor
protein
Rv3269
probable heat shock protein
C. Cell division
Rv3641c fic
Rv3102c ftsE
Rv3610c ftsH
Rv2748c
Rv2151c
Rv2154c
ftsK
ftsQ
ftsW
Rv3101c
Rv2921c
Rv2150c
Rv3919c
Rv3625c
Rv3917c
ftsX
ftsY
ftsZ
gid
mesJ
parA
Rv3918c
parB
Rv2922c
smc
Rv0012
Rv0435c
Rv2115c
Rv3213c
Rv1708
Rv1821
secA2
Rv2587c
Rv0638
Rv2586c
Rv1440
secD
secE
secF
secG
Rv0732
secY
Rv2462c
tig
Rv2813
unit
SecA, preprotein translocase subunit
protein-export membrane protein
SecE preprotein translocase
protein-export membrane protein
protein-export membrane protein
SecG
SecY subunit of preprotein translocase
chaperone protein, similar to
trigger factor
probable general secretion pathway protein
bpoA
Rv1123c
bpoB
Rv0554
bpoC
Rv3617
Rv1938
Rv1124
Rv2214c
Rv3670
Rv0134
Rv3171c
ephA
ephB
ephC
ephD
ephE
ephF
hpx
Rv1908c
Rv3846
Rv0432
katG
sodA
sodC
Rv1932
Rv0634c
Rv2581c
Rv3177
tpx
-
IV. Other
A. Virulence
Rv0169
mce1
Rv0589
mce2
Rv1966
mce3
Rv3499c mce4
Rv3100c smpB
Rv1694
tlyA
Rv0024
Rv0167
Rv0168
Rv0170
Rv0171
Rv0172
Rv0174
Rv0587
Rv0588
Rv0590
Rv0591
Rv0592
Rv0594
Rv1085c Rv1477
Rv1478
Rv1566c
Rv1964
Rv1965
Rv1967
Rv1968
Rv1969
Rv1971
Rv2190c
Rv3494c
Rv3496c
Rv3497c
Rv3498c
Rv3500c
Rv3501c
Rv3896c
Rv3922c
7 copies
3. Phage-related functions
Rv2894c xerC
integrase/recombinase
Rv1701
xerD
integrase/recombinase
Rv1054
integrase-a
Rv1055
integrase-b
Rv1573
phiRV1 phage related protein
Rv1574
phiRV1 phage related protein
Rv1575
phiRV1 phage related protein
Rv1576c phiRV1 phage related protein
Rv1577c phiRV1 possible prohead protease
Rv1578c phiRV1 phage related protein
Rv1579c phiRV1 phage related protein
Rv1580c phiRV1 phage related protein
Rv1581c phiRV1 phage related protein
Rv1582c phiRV1 phage related protein
Rv1583c phiRV1 phage related protein
Rv1584c phiRV1 phage related protein
Rv1585c phiRV1 phage related protein
Rv1586c phiRV1 integrase
Rv2309c integrase
Rv2310
excisionase
Rv2646
phiRV2 integrase
Rv2647
phiRV2 phage related protein
Rv2650c phiRV2 phage related protein
Rv2651c phiRV2 prohead protease
Rv2652c phiRV2 phage related protein
Rv2653c phiRV2 phage related protein
Rv2654c phiRV2 phage related protein
Rv2655c phiRV2 phage related protein
Rv2656c phiRV2 phage related protein
Rv2657c similar to gp36 of mycobacteriophage L5
Rv2658c phiRV2 phage related protein
Rv2659c phiRV2 integrase
Rv2830c similar to phage P1 phd gene
Rv3750c excisionase
Rv3751
putative integrase
68 members
22
G. Coenzyme F420-dependent
enzymes
H. Miscellaneous transferases
61
18
J. Cyclases
K. Chelatases
V. Conserved hypotheticals
912
VI. Unknowns
606
TOTAL
3924