When life was simple

Getting back to why the DNA encoding of amino acids is such a compelling argument for Darwinian evolution…

The diagram I saw in Watson’s book “Recombinant DNA” was a variant on the following table:

You probably learned in school (but forgot) that the twenty amino acids (the building blocks of all proteins — a protein is just a string of amino acids) are Alanine, Arginine, Asparagine, Aspartic acid, Cysteine, Glutamic acid, Glutamine, Glycine, Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Proline, Serine, Threonine, Tryptophan, Tyrosine and Valine.

You probably also learned (and didn’t forget) that DNA provides instructions for producing these sequences of amino acids. In particular, each amino acid is encoded with a sequence (called a “codon”) of three adjacent base pairs on the DNA chain. Each base pair can be one of four types, usually labeled A, T, G or C (named after the corresponding molecules in your DNA – adenine, thymine, guanine and cytosine).

The only thing that really matters about all this, for our discussion, is what you can see in the above table. Basically, that there are 64 possible codons, because there are 4×4×4 possible ways to put together three base pairs. Any one of these 64 codons ends up encoding either one of the twenty possible amino acids, or else a special instruction to start or stop the process of adding amino acids to the protein.

Of course this code is redundant, since 64 codons is a lot more than 20 amino acids. So there’s generally more than one way to encode a particular amino acid (as you can see in the table).

The thing that struck me when I first encountered this encoding in Watson’s book was that it reminded me of one of those brain-teaser mystery stories where you are told about a crime and you try to figure out how it was done (before the detective gives it away in the end).

The mystery story in this case is all about “what happened when”. The table above clearly shows that at first there was some simpler form of proto-life way back when, which got by just fine with two base pairs. You can see this because there a bunch of amino acids (SER, PRO, ARG, ILE, THR, VAL, ALA and GLY) that never use the third base pair at all.

Furthermore, when the third base pair is the only one that distinguishes between amino acids (as in the case of HIS and GLN), then at most two amino acids ever result, even though that third base pair has the power to identify up to four unique amino acids.

In other words, there is no system at work in how the third base pair is used. In each of these HIS/GLN kinds of cases, at some point an extra amino acid was useful to have around, and some little kluge of a change happened to allow for it.

Note also that the really important structural codes — the ones that signal to start and stop a protein — are mirror images of each other in those first two base pairs. Things start with “AU” and stop with “UA”. It looks as though at some point in the very distant past, back when life was simpler (literally), there might have been a two base-pair version of the codons, with AU and UA as the respective start and stop codes.

Perhaps in some long ago transcription error, the replicating mechanism of one mutant creature started counting by threes instead of twos. But a billion years of variations might have transpired before that ever happened.

The table provides hints to a time even further back, when there may have been only one controlling base pair. Notice the second column of the table — the codons for SER, PRO, THR and ALA. Only a single base pair (the first one) distinguishes between these four amino acids. At some point in the distant past (again, looking at the table) a second base pair proved useful, and the encodings of LEU and ARG split off from PRO. Similarly, ALA split off into VAL and GLY. In each case, new kinds of proteins were now possible.

That tacked on third base pair encodes a lot less than the first two, but then again it never needed to do much. The critical mass of 20 amino acids — sufficient for a huge increase in protein functionality — was finally reached, and the rest (as they say) is history.

There isn’t enough information in the table to give us any precise ordering of events, and yet the history suggested by the encoding of amino acids clearly points to a kind of blind step-wise search: Something happened, then something else happened, and each step was a haphazard refinement of whatever stuff was already around to work with. This is exactly what happens when randomly recombinable elements are run through a fitness function (ie: when one combination happens to survive a tiny better than another).

Like I said the other day, if this structure was all put there by an intelligent God — a structure that clearly suggests the gradual result of fitness-directed evolution — that God must have one hell of a sense of humor.

One thought on “When life was simple”

Leave a Reply