You are on page 1of 3

Kung 1

Selection of 20 Standard Amino Acids: Random Chance versus Biological Significance

Introduction
One of the most vital parts of the central dogma is the translation of mRNA to proteins
constructed of the 20 standard amino acids. However, the specific reason that these 20 amino
acids have been selected is unclear. One of the most discerning factors between these
sequences is its structure, and therefore, its molecular size. Therefore, if the sizes of the
standard 20 amino acid library were randomly selected by nature, then its standard deviation
would reflect those pertaining to randomly generated lists of 20 amino acid sizes.

Methods
AAIndex (GenomeNet) was queried for three chemical space properties: size,
hydrophobicity, and charge. Entries were selected based on their breadth, such that entries
focusing on amino acids in specific structures were ignored. Entry DAWD720101 was
selected for size, PRAM900101 for hydrophobicity, and KLEP840101 for charge
(GenomeNet).
Each amino acid was subsequently plotted as a data point consisting of all three
properties in order to evaluate for any possible patterns. A bubble graph was used for this
approach, with size as the X value, hydrophobicity as the Y value, and the net charge as the
determination for the bubble size, with the smallest bubble size being a negative charge, a
medium size being a neutral charge, and the largest being a positive charge, so as to provide a
Z axis for the graph.
At this point, size was selected to be tested against a random Monte Carlo sample
(Woller). Using Excel, 10,000 rows of 20 cells were filled with the equation,
“=RANDBETWEEN(1,7.5)/2” to generate a random number ranging from 0.5 to 7.5, the
ranges of the actual standard 20 amino acid sequence. From this, 10,000 samples of randomly
generated 20 amino acid library sizes were obtained. Excel was picked over other possible
Monte Carlo generations was because of the possibility of recreating a data set by modifying
any cell and creating a new associated graph, allowing for essentially an infinite number of
tests with little compiling time.
At this point, the standard deviation of the samples was obtained using the Excel
function, “=STDEV(A#:T#)” and placed into bins using the histogram function
“=FREQUENCY(U3:U10002, W4:W45)” function consisting of bins from 1.20 to 3.25, in
0.05 increments, with bins containing all entries less than or equal to the associated bin value
and above the previous bin value. The histogram was plotted to a bar graph, and the actual
value was inserted for comparison.
Kung 2

Results

Figure 1. Properties of the Standard Amino Acids. Data for the 20 amino acids was plotted
on a graph comparing size to hydrophobicity and net charge. A positive hydrophobicity value
indicates hydrophobic interactions, while a negative value indicates hydrophilic interactions.

Figure 2. Distribution of 10,000 Randomly Generated Sequences of 20 Amino Acid Sizes.


This graph is dynamically generated based on the Monte Carlo dataset created in Excel. The
red line present in the graph is the standard deviation of the actual set of 20 standard amino
acids.
Kung 3

Conclusions
Figure 1 shows very little in terms of correlation. The only noticeable pattern is that those
structures with net charge have higher hydrophobicity, but even that could be due strictly to
chance. From the graph, it seems that there is very little correlation between any of the three
properties of chemical space.

Figure 2, on the other hand, shows that there is a significant difference between the actual and
theoretical standard deviations of standard amino acid size. If size had not been a factor in
selection of amino acids, an expected standard deviation of 2.2 would have been likely.
However, the standard deviation of actual standard amino acids resides at 1.8778, which
implies that size may have been a factor in selection of amino acids. A likely possibility that
the standard deviation is lower than expected is that only a few molecules are designated to be
“small building blocks,” while the remaining pieces are

Sources Cited
GenomeNet. “Amino acid indices, substitution matrices and pair-wise contact potentials”
(2008). http://www.genome.ad.jp/aaindex/ Last accessed February 12th, 2009.
Woller, Joy. “The basics of Monte Carlo Simulations.” (2008.)
http://www.chem.unl.edu/zeng/joy/mclab/mcintro.html Last Accessed February 12th,
2009.
.

You might also like