You are on page 1of 7

Briana Halbert Bioinformatics Computer Lab October 25, 2013

Purpose
The purpose of this activity is to successfully determine the length of the cDNA fragment, translation initiation, and termination using web based tools NCBI-BLAST to find out the protein sequence in one-letter abbreviations, molecular weight, pI, amino acid composition, and the proteins extinction coefficient. This information will be used in addition to background information to determine the functional characteristics of the assigned gene. By performing this activity, experimenters will understand the concepts of protein and DNA sequence functions and their specific identities.

Background
Gene Rv0211 has a functional subunit that serves as the Rate-limiting gluconeogenic enzyme [catalytic activity: GTP + oxaloacetate = GDP + phosphoenolpyruvate + CO2]. The function of the complex as a whole can be categorized as intermediary metabolism and respiration. Methionine (Met) residues of proteins are readily oxidized to methionine sulfoxide (MetO), especially under oxidative stress conditions. Oxidative alteration of Met to R/SMet(O) sterioisomers is reversed by methionine sulfoxide reductases which reduce: MsrA, S-MetO and MsrB and R-MetO, which prevents irreversible oxidative protein damage. This protein is highly conserved and it carries out the enzymatic reduction of methionine sulfoxide to methionine This is important because oxidative protein damage can cause Alzheimers Disease in people because one of the major causes of this disease is high oxidative stress levels. The proposed function of this gene is the repair of oxidative damage to proteins to restore biological activity. Mycobacterium tuberculosis is the bacterium that causes the disease tuberculosis in humans. Tuberculosis (TB) is the leading cause of death in the world from a bacterial infectious disease. The disease affects 1.8 billion people/year, which is equal to one-third of the entire world population. M. tuberculosis is an obligate aerobe. Because of this, the bacterium is always found in the well aerated upper lobes of the lungs. It is primarily transmitted through the air.1 Since M. tuberculosis is a bacterium, it is prokaryotic and contains DNA. The expression of DNA, similar for all organisms, is manifested in the transcription of RNA to be further translated into protein. However, the transcription of RNA is regulated by proteins. As mentioned previously, this gene (Rv0211) functions as a rate-limiting gluconeogenic enzyme. Bioinformatics is the study of science that focuses on the collection and analysis of biological information through computer generated sequences. The origin of this science was discovered during the construction of the Genome Project. The Genome Project allowed bioinformatics to target both the biological and genomic information simultaneously.

Briana Halbert Bioinformatics Computer Lab October 25, 2013

Procedure
At the beginning of the experiment, the site http://www.ncbi.nlm.nih.gov/ was located. Once located, the pull down menu was utilized to find the category of the gene, specifically gene Rv0211. Once the results appear from the search, the top most result was selected in order to record- the function of the gene product. The link button was clicked in order to search and download the gene sequence through clicking GenBank. Next the gene number was identified. The genes protein sequence was then viewed in one letter code and the DNA sample. Both findings of data were copied and paste into a document which enable the process to go ahead and find web.expasy.org/protparam/ and relocate the data into the given box of the website. From there compute parameters was programmed. The results were shown in record of number of amino acids in the protein, molecular weight of the protein, theoretical pI of protein, the amino acid composition of the protein, and the extinction with/without disulfide bonds. Next the pI was determined in order to find net charge of the protein at 7.0. The proper ion exchange column was chosen for purity. Also the number of tyrosine and typtophan was checked through the use of amino acids composition. The observations were then recorded with their perspective efficient coefficients. Next the other genes were analyzed in order to obtain the same formation of results but with different data. More observations were recorded. The site http://www.ncbi.nlm.nih.gov was found in order to select proteins and go to their data base. Blast protein toll was selected. In the section of BLAST, the protein sequence was posted. From here the blast button was selected onto a page that displays homology information. Alignments were searched in order to adjust a series of 3 amino acids sequences. Observations were jotted down on the representation of the first, second, and third lines in sequences of the protein. The Blast was copied and pasted 95-98% similar to assigned protein. Search data was also included in results.

Results
Amino Acid Sequence of Rv0211 MTSATIPGLDTAPTNHQGLLSWVEEVAELTQPDRVVFTDGSEEE FQRLCDQLVEAGTFIRLNPEKHKNSYLALSDPSDVARVESRTYICSAKEIDAGPTNNW MDPGEMRSIMKDLYRGCMRGRTMYVVPFCMGPLGAEDPKLGVEITDSEYVVVSMRTMT RMGKAALEKMGDDGFFVKALHSVGAPLEPGQKDVAWPCSETKYITHFPETREIWSYGS GYGGNALLGKKCYSLRIASAMAHDEGWLAEHMLILKLISPENKAYYFAAAFPSACGKT NLAMLQPTIPGWRAETLGDDIAWMRFGKDGRLYAVNPEFGFFGVAPGTNWKSNPNAMR TIAAGNTVFTNVALTDDGDVWWEGLEGDPQHLIDWKGNDWYFRETETNAAHPNSRYCT PMSQCPILAPEWDDPQGVPISGILFGGRRKTTVPLVTEARDWQHGVFIGATLGSEQTA AAEGKVGNVRRDPMAMLPFLGYNVGDYFQHWINLGKHADESKLPKVFFVNWFRRGDDG RFLWPGFGENSRVLKWIVDRIEHKAGGATTPIGTVPAVEDLDLDGLDVDAADVAAALA VDADEWRQELPLIEEWLQFVGEKLPTGVKDEFDALKERLG Figure 1. Amino Acid Sequence

Briana Halbert Bioinformatics Computer Lab October 25, 2013


Fourth Gene Rv0211 # of amino acids 606 Molecular Weight 67253.0 g Theoretical pI 4.92 Extinction Coefficient 134340

Table 1. Gene Fourth Data Amino Acid Composition Arg (R) 31 Asn (N) 22 Asp (D) 43 Cys (C) 9 Gln (Q) 14 Glu (E) 43 Gly (G) 58 His (H) 12 Ile (I) 24 Leu (L) 49 Lys (K) 28 Met (M) 19 Phe (F) 26 Pro (P) 37 Ser (S) 26 5.1% 3.6% 7.1% 1.5% 2.3% 7.1% 9.6% 2.0% 4.0% 8.1% 4.6% 3.1% 4.3% 6.1% 4.3% Thr (T) 36 Ile (I) 24 Leu (L) 49 Lys (K) 28 Met (M) 19 Phe (F) 26 Pro (P) 37 Ser (S) 26 Thr (T) 36 Trp (W) 20 Tyr (Y) 16 Val (V) 39 Pyl (O) 0 Sec (U) 0 Ala (A) 54 Arg (R) 31 5.9% 4.0% 8.1% 4.6% 3.1% 4.3% 6.1% 4.3% 5.9% 3.3% 2.6% 6.4% 0.0% 0.0% 8.9% 5.1%

Table 2. Amino Acid Composition Total Number of Tyrosine and Tryptophan: 16 + 20 = 36 total Total Number of Cysteine: 9 Wavelength 280 Molar Extinction w/o Disulfides 133840 Molar Extinction w/ All Disulfides 134340

Table 3. Extinction Coefficient for Rv0211

Briana Halbert Bioinformatics Computer Lab October 25, 2013

Gene # 1 2 3 4

# of Trp + Tyro (Total) 12 22 9 36 34045 56965 25440 134340

`Group Name Oliver, Faine Young, Hendricks Wilson, Davis, Brownley Graham, Mosley

Table 4. Four Experimental Data Groups

Score 1155 bits(2987)

Expect 0.0

Method Compositional matrix adjust.

Identities 548/605(91%)

Positives 577/605(95%)

Gaps 0/605(0%)

Table 5. Homolog of Rv0211 Query 1 MTSATIPGLDTAPTNHQGLLSWVEEVAELTQPDRVVFTDGSEEEFQRLCDQLVEAGTFIR 60 MTSATIPGLDTAPTNHQGLLSWV+EVAELTQPDRVVF DGS+EEF RL QLV+AGTF R Sbjct 1 MTSATIPGLDTAPTNHQGLLSWVQEVAELTQPDRVVFADGSDEEFHRLSAQLVDAGTFTR 60 Query 61 LNPEKHKNSYLALSDPSDVARVESRTYICSAKEIDAGPTNNWMDPGEMRSIMKDLYRGCM 120 LN EK NSYLALSDPSDVARVESRT+ICS +EIDAGPTNNWMDP EMR++M DLYRGCM Sbjct 61 LNDEKFPNSYLALSDPSDVARVESRTFICSEREIDAGPTNNWMDPSEMRTLMTDLYRGCM 120 Query 121 RGRTMYVVPFCMGPLGAEDPKLGVEITDSEYVVVSMRTMTRMGKAALEKMGDDGFFVKAL 180 RGRTMYVVPFCMGPLGAEDPKLGVEITDSEYVVVSM+ MTRMG AALEKMG DGFFVKAL Sbjct 121 RGRTMYVVPFCMGPLGAEDPKLGVEITDSEYVVVSMKVMTRMGTAALEKMGQDGFFVKAL 180 Query 181 HSVGAPLEPGQKDVAWPCSETKYITHFPETREIWSYGSGYGGNALLGKKCYSLRIASAMA 240 HSVGAPLE GQ DV WPCS+TKYITHFPETREIWSYGSGYGGNALLGKKCYSLRIASAMA Sbjct 181 HSVGAPLEDGQADVPWPCSDTKYITHFPETREIWSYGSGYGGNALLGKKCYSLRIASAMA 240

Briana Halbert Bioinformatics Computer Lab October 25, 2013


Query 241 HDEGWLAEHMLILKLISPENKAYYFAAAFPSACGKTNLAMLQPTIPGWRAETLGDDIAWM 300 DEGWLAEHMLILKLISPENKAYY AAAFPSACGKTNLAMLQPTIPGWRAETLGDDIAWM Sbjct 241 RDEGWLAEHMLILKLISPENKAYYIAAAFPSACGKTNLAMLQPTIPGWRAETLGDDIAWM 300 Query 301 RFGKDGRLYAVNPEFGFFGVAPGTNWKSNPNAMRTIAAGNTVFTNVALTDDGDVWWEGLE 360 RFGKDGRLYAVNPEFGFFGVAPGTNWKSNPNAMRTIAAGNTVFTNVALTDDG+VWWEGLE Sbjct 301 RFGKDGRLYAVNPEFGFFGVAPGTNWKSNPNAMRTIAAGNTVFTNVALTDDGEVWWEGLE 360 Query 361 GDPQHLIDWKGNDWYFRETETNAAHPNSRYCTPMSQCPILAPEWDDPQGVPISGILFGGR 420 GDPQHL+DWKGN+WYFRETET AAHPNSRYCTPMSQCPILAPEWDDPQGVPIS ILFGGR Sbjct 361 GDPQHLVDWKGNEWYFRETETTAAHPNSRYCTPMSQCPILAPEWDDPQGVPISAILFGGR 420 Query 421 RKTTVPLVTEARDWQHGVFIGATLGSEQTAAAEGKVGNVRRDPMAMLPFLGYNVGDYFQH 480 RKTTVPLVT+ARDWQHGVFIGATLGSEQTAAAEGKVGNVRRDPMAMLPF+GYNVGDY QH Sbjct 421 RKTTVPLVTQARDWQHGVFIGATLGSEQTAAAEGKVGNVRRDPMAMLPFMGYNVGDYVQH 480 Query 481 WINLGKHADESKLPKVFFVNWFRRGDDGRFLWPGFGENSRVLKWIVDRIEHKAGGATTPI 540 WI++GK++DESKLP+VFFVNWFRRG+D RFLWPGFGENSRV+KWIVDRIEHKAGG TTPI Sbjct 481 WIDIGKNSDESKLPQVFFVNWFRRGEDHRFLWPGFGENSRVMKWIVDRIEHKAGGKTTPI 540 Query 541 GTVPAVEDLDLDGLDVDAADVAAALAVDADEWRQELPLIEEWLQFVGEKLPTGVKDEFDA 600 GTVP VEDLDL+GLD + ADV+ ALAV+A+EWR+ELPLIEEWLQF+GEKLPTG+KDEFDA Sbjct 541 GTVPTVEDLDLEGLDANPADVSEALAVNAEEWREELPLIEEWLQFIGEKLPTGIKDEFDA 600 Query 601 LKERL 605 Sbjct 601 LKERL 605 LKERL

Briana Halbert Bioinformatics Computer Lab October 25, 2013

Discussion Homologs are useful in confirming the function of a gene based off of a known function of a gene homologous to the gene of interest. In the homology search, it was important to find a homolog that had a high percentage of similarity. The homolog chosen has a sequence with 605 amino acids as opposed to the 606 amino acids in Rv0211. The similarity is 91% with Rv0211. In analyzing the sequence for the homolog there are gaps, +, and -. The gaps in the sequence mean that there are penalties. Where there are spaces, the sequences of the homolog and Rv0211 are not similar, and where there are + signs the two sequences have similar chemical characteristics. This homologue represents a phosphoenolpyruvate carboxykinase like Rv0211. Phosphoenolpyruvate carboxykinase is an important enzyme in gluconeogenesis. It is found in both the cytosol and mitochondria of the liver cells. The enzyme is regulated by insulin, glucocorticoids, cyclic adenosine monophosphate (cAMP) and diet to maintain glucose homeostasis. There are two types of phosphoenolpyruvate carboxykinase that exist which are PCK1, PEPCK1 (soluble in the cytosol) and PCK2, PEPCK2 (soluble in the
mitochondria).

The sum of the number of Tryptophan and Tyrosine in a gene also has a great impact on the molar extinction. The extinction coefficient of a protein at 280 nm depends almost exclusively on the number of aromatic residues, particularly tryptophan, and can be predicted from the sequence of amino acids. The molar extinction is a measurement of how strongly a chemical species absorbs light at a given wavelength. In the case of Rv0211 where the Try-Trp sum is 36, the molar extinction with and without disulfides is 134340 and 133840 respectively, which is far greater than those of Rv0137c whose Try-Trp sum is 12 and molar extinction with and without disulfides is 34045. Rv0137c has a higher molar extinction with disulfides than Rv0162c whose extinctions with disulfides is 25440,

Briana Halbert Bioinformatics Computer Lab October 25, 2013 respectively. Rv01472 has the second highest molar extinction with disulfides corresponding to 56965.

References

1. Todar, Kenneth. "Tuberculosis." Todar's Online Textbook of Bacteriology. N.p., 2008. Web. 5 Oct 2010. <http://www.textbookofbacteriology.net/tuberculosis.html>. 2. "Patient.co.uk - Trusted Medical Information and Support." Patient.co.uk. N.p., n.d. Web. 01 Nov. 2013.

You might also like