Abstract
Life as we know it is based on foldable biopolymers encoded with just 4 nucleotides or 20 amino acids. Evolution of these biopolymers requires effective and fast search of both the conformational space for folding and the sequence space for evolution. Energy landscape theory links the free energy of the possible polymer sequences and conformations with its ability to fold, while molecular information theory provides connections between conformational entropy and sequence entropy. Combining these two theories provides constraints to the alphabet size of an evolving biopolymer, given its physico-chemical properties. Empirical estimations of the size of the effective sequence and conformational spaces of foldable proteins and RNA show that the observed alphabet sizes agree with the theoretical predictions and are just large enough to allow for biopolymer evolution. In this scenario, one effective digital alphabet letter in the sequence landscape maps to one effective analog monomer configuration in the conformational landscape. We also use the current views on genetic code evolution to explore scenarios for biopolymers in the early stages of life. We find that primitive genetic systems coding for smaller amino acid alphabets may have led to a prominent presence of intrinsically disordered proteins.