Abstract
The relationship between genotype and phenotype underlies our ability to understand and predict evolution. Efforts to build genotype-phenotype (GP) maps have revealed several unifying rules: epistasis is pervasive, fitness effects are not normally distributed, and the GP map is non-linear and complicated for high-level phenotypes. A critical step in developing GP maps is evaluating how well predictive models do in explaining observed phenotypes. We utilize the simplicity of a bacteriophage (ΦX174) study system to test if an intermediate phenotype (predicted stability of the G capsid protein) explains more complex phenotypes. In doing so, we compare the predictive performance of free energies of folding and binding obtained using numerous molecular modeling methods as well as phylogenetic and basic biochemical/biophysical properties of amino acid substitutions. By creating a large mutational library, we find that ΦX174 tolerates about 50% of the amino acid substitutions we inserted into the G protein and that molecular modeling compliments other substitution models for predicting viability. Mutations predicted to have large destabilizing effects are especially informative and are almost universally detrimental. These large-effect substitutions often coincide with the most conserved residues in the G protein. Apart from large-effect mutations, our ability to predict ΦX174 phenotypes is fairly poor and we explore various potential confounding factors (e.g., codon bias) that could be considered to improve viability predictions.