You are on page 1of 2

From the genetic code to the proof of Fermat’s little theorem

Nucleic acids encode the 20 amino acids found in the sequence of a protein using
just 4 bases: A, G, T, C in DNA. Thus, the 4-symbol nucleic acid alphabet encodes a
20-symbol protein alphabet. This is achieved by having 3 letters in the nucleic
acid language (a codon) code for one letter in the protein language or for the stop
sign to terminate the protein sequence. This is the famed genetic code. When we
first learned of it at age 10, we became fascinated with the process of encoding
and while playing around with codons, we learned not just the foundations of
biology but also some of the basics of combinatorics. That was perhaps the reason
why some years later in college we were quite agile with elementary combinatorics
despite not having any special mathematical spark.

The first and the most obvious thing we observed was that the total number of
codons in the genetic code was the total number permutations with replacements that
could achieved in 3-letter words using a 4-symbol alphabet: 4\times 4\times 4 = 64.
A closer look then revealed that these 3-letter words, i.e. codons, could be
classified into groups by arranging them as ring graphs (Figure 1). Since nucleic
acids have a polarity imparted by the (deoxy) ribose ring of the sugar, i.e.
5'\rightarrow 3', each of these ring graphs are directed: they take the form of an
uroboros. Thus, we get into 24 distinct groups

Figure 1

Of these, 4 rings are homopolymeric, i.e. AAA, GGG, TTT and CCC. Any circular
permutation of them will yield the same codon again. Each of the remaining 20 rings
is heteropolymeric. Hence, when circularly permuted, each will always yield 3
different codons. For example, the first ring in the second row (Figure 1) will
yield AGA, GAA and AAG. Thus, we get the total number of permutations possible in
the 3-letter words with a 4-symbol alphabet as: 20 \times 3 +4 =64. This reveals a
more important truth of combinatorics: If you have any word of prime number length
then by definition, other than for homotypic words (equivalent to a homopolymeric
codon), it will always have same prime number of circular permutations when the
letters are arranged on a directed ring graph as in Figure 1. 3 is the prime number
in the case of the genetic code; hence; the circular permutations of the
heteropolymeric rings yield 3 codons each. We can express this as a generalization
thus: Let a be the number of symbols in the alphabet. Let p be a prime which is the
length of the word in that alphabet. We also insist that a is not divisible by p.
Then the total number of p-letter words will be n=a^p. Of these the homotypic words
will amount to a. Thus, the remainder will be a^p-a words. Now, these remaining
words, by the above principle of arranging on directed ring graphs, can be grouped
into k sets each of p words. Thus, a^p-a= kp; therefore it will be divisible by p.
Alternatively,

(a^p-a) \mod p =0; \; \therefore (a^{p-1}-1) \mod p=0; \; \therefore a^{p-1} \mod p
=1

This is the famous Fermat’s little theorem of arithmetic. Fermat had proposed it
without a proof but it was subsequently proven by Liebniz. Euler published a proof
more than 50 years later, apparently unaware of Liebniz’s manuscript which is
believed to have not been formally published. He then provided its general form,
the theorem of Euler regarding the totient function \varphi(n), which we had
encountered in the previous note. The above proof which we presented is the proof
by combinatorics. It was apparently first published by the mathematician Golomb and
is a variant of Euler’s original proof.

The periods of the reciprocals of prime numbers


Early tetrapods showed a wide range of finger-counts in their limbs: Acanthostega
had an 8-fingered limb; Ichthyostega showed a 7-fingered hind-limb; Tulerpeton had
6 fingers. Some time thereafter, perhaps in a form like Crassigyrinus, the number 5
got fixed. While there are frequent deviations from this in particular lineages,
like amphibians losing a finger in the forelimb, the 5-fingered state continued to
be the common baseline in most surviving tetrapod clades. Thus, we got our 5-
fingered hands. This combined with our bilateral symmetry gave us a number system
based on the product of 2 primes: 10=2 \times 5; i.e., the decimal system. While
some islanders of Papua have apparently opted for the smaller senary system based
on 6=2\times3, the former system came to be the dominant usage of the world.

The peculiarities of the decimal system caught our fancy when our father began
teaching us decimal fractions as a kid. We were fascinated by the observation that
some decimal fractions terminated: \tfrac{1}{8}=0.125, whereas others just fell
into a cycle: \tfrac{1}{11}=0.090909.... We asked our father why this was so? He
told us to focus on: 1) reciprocals, i.e. fractions of the type \tfrac{1}{a}
because all other fractions are integer multiples of such and 2) to look out for
primes. Then

You might also like