Abstract
The genetic code deterministically maps the 64 possible codons to 20 amino acids, as well as to "START" and "STOP" signals. This universal codon-amino acid mapping (C-AAM) is conserved across almost all living species. The inherent redundancy, arising from mapping 64 codons to only 22 outputs, grants resilience against certain nucleotide substitutions, a property that is conceptually analogous to an error-correcting code (ECC) used in communication systems. ECCs introduce redundancy to protect information against the most probable or most consequential errors during transmission. While coding theory has historically been explored to study the genetic code, biological analogies to traditional communication system elements, such as "source," "encoder," and "channel", remain elusive due to their complexity and partially unknown characteristics. In this study, we adopt the perspective of a communication engineer tasked with reverse-engineering a communication system in which, among its components, only the decoder (the genetic code) is known. By applying this reverse-engineering approach, we introduce the Finding Error Hierarchy (FEH) algorithm, which enables the inference of a hierarchy of nucleotide substitutions against which the genetic code is particularly robust. The methodology also identifies specific amino acid properties that the genetic code preferentially preserves. These findings are validated by their consistency with results from previous studies of the genetic code, conducted using diverse methodologies. We examined mutation patterns at the codon level, allowing consideration of up to three nucleotide substitutions within a single mutation pattern. Since the vast majority of previous studies explored point mutations, the currently derived mutation hierarchy is more comprehensive. This extended hierarchy underscores the biological importance of specific mutations, and offers new perspectives on the functional principles underlying the genetic code.