Professional Documents
Culture Documents
Introduction
One of the most vital parts of the central dogma is the translation of mRNA to proteins
constructed of the 20 standard amino acids. However, the specific reason that these 20 amino
acids have been selected is unclear. One of the most discerning factors between these
sequences is its structure, and therefore, its molecular size. Therefore, if the sizes of the
standard 20 amino acid library were randomly selected by nature, then its standard deviation
would reflect those pertaining to randomly generated lists of 20 amino acid sizes.
Methods
AAIndex (GenomeNet) was queried for three chemical space properties: size,
hydrophobicity, and charge. Entries were selected based on their breadth, such that entries
focusing on amino acids in specific structures were ignored. Entry DAWD720101 was
selected for size, PRAM900101 for hydrophobicity, and KLEP840101 for charge
(GenomeNet).
Each amino acid was subsequently plotted as a data point consisting of all three
properties in order to evaluate for any possible patterns. A bubble graph was used for this
approach, with size as the X value, hydrophobicity as the Y value, and the net charge as the
determination for the bubble size, with the smallest bubble size being a negative charge, a
medium size being a neutral charge, and the largest being a positive charge, so as to provide a
Z axis for the graph.
At this point, size was selected to be tested against a random Monte Carlo sample
(Woller). Using Excel, 10,000 rows of 20 cells were filled with the equation,
“=RANDBETWEEN(1,7.5)/2” to generate a random number ranging from 0.5 to 7.5, the
ranges of the actual standard 20 amino acid sequence. From this, 10,000 samples of randomly
generated 20 amino acid library sizes were obtained. Excel was picked over other possible
Monte Carlo generations was because of the possibility of recreating a data set by modifying
any cell and creating a new associated graph, allowing for essentially an infinite number of
tests with little compiling time.
At this point, the standard deviation of the samples was obtained using the Excel
function, “=STDEV(A#:T#)” and placed into bins using the histogram function
“=FREQUENCY(U3:U10002, W4:W45)” function consisting of bins from 1.20 to 3.25, in
0.05 increments, with bins containing all entries less than or equal to the associated bin value
and above the previous bin value. The histogram was plotted to a bar graph, and the actual
value was inserted for comparison.
Kung 2
Results
Figure 1. Properties of the Standard Amino Acids. Data for the 20 amino acids was plotted
on a graph comparing size to hydrophobicity and net charge. A positive hydrophobicity value
indicates hydrophobic interactions, while a negative value indicates hydrophilic interactions.
Conclusions
Figure 1 shows very little in terms of correlation. The only noticeable pattern is that those
structures with net charge have higher hydrophobicity, but even that could be due strictly to
chance. From the graph, it seems that there is very little correlation between any of the three
properties of chemical space.
Figure 2, on the other hand, shows that there is a significant difference between the actual and
theoretical standard deviations of standard amino acid size. If size had not been a factor in
selection of amino acids, an expected standard deviation of 2.2 would have been likely.
However, the standard deviation of actual standard amino acids resides at 1.8778, which
implies that size may have been a factor in selection of amino acids. A likely possibility that
the standard deviation is lower than expected is that only a few molecules are designated to be
“small building blocks,” while the remaining pieces are
Sources Cited
GenomeNet. “Amino acid indices, substitution matrices and pair-wise contact potentials”
(2008). http://www.genome.ad.jp/aaindex/ Last accessed February 12th, 2009.
Woller, Joy. “The basics of Monte Carlo Simulations.” (2008.)
http://www.chem.unl.edu/zeng/joy/mclab/mcintro.html Last Accessed February 12th,
2009.
.