Abstract
The bacteriophages with single-stranded RNA (ssRNA) genomes (class Leviviricetes) are among the simplest known viruses that encode only three core proteins: a receptor-binding protein, a capsid protein, and an RNA-dependent RNA polymerase. The number of isolated ssRNA phages has remained very low, but the accumulating RNA metagenome data have uncovered a large variety of these viruses in many environments. Besides the core proteins, many of these genomes putatively encode additional proteins, which up to now have remained uncharacterized. We looked for non-conserved open reading frames (ORFs) in Leviviricetes sequences from the IMG/VR virus metagenome database and used sequence- and structure-based clustering to organize them into similarity groups. Potential ORFs were found throughout the ssRNA phage genomes but almost exclusively on the positive-sense RNA strand, suggestive of their protein-coding potential. The prevalence of the non-conserved ORFs varied in various phage lineages, and their distribution among different genome positions was markedly uneven. Most of the identified ORFs encode all-α proteins, a portion of which contain transmembrane segments that resemble a group of known ssRNA phage lysis proteins, while many others represent previously uncharacterized families of globular or semi-globular α-helical proteins. We additionally uncovered a major class of globular α/β proteins and experimentally determined the structure of a representative protein of this group. These results pave the way for further functional studies of novel ssRNA phage proteins for a better understanding of this diverse virus group.