Abstract
BACKGROUND: The birth of new genes from non-coding sequences has been postulated to be preceded by a proto-gene phase, in which a sequence is translated into protein but does not exhibit hallmarks of a clear function. Despite the abundance of such proto-genes in bacterial genomes, the frequency of their emergence and whether they actually act as precursors of new genes in natural populations are still open questions. RESULTS: To address these issues, we applied a combination of transcriptomic, proteomic, and comparative genomic approaches to identify and analyze hundreds of novel bacterial protein-coding genes that have previously escaped annotation. These novel proteins, including many that are widely conserved across genera, display sequence properties indistinguishable from the non-coding regions of the genome, suggesting that the vast majority are evolving neutrally. Despite their abundance and high degree of taxonomic restriction, we were only able to rigorously establish the de novo emergence of one proto-gene within the history of Escherichia coli, highlighting the difficulty of detecting this mode of gene birth in bacterial genomes. Contrary to expectations, we discover that proto-genes emerge at a uniform rate across distant bacterial taxa despite significant differences in their genomic characteristics, suggesting the presence of taxon-specific mechanisms that regulate their origination and persistence. CONCLUSIONS: Overall, our findings indicate that proto-genes regularly emerge in bacterial populations but that their sequence properties furnish little evidence that they serve as precursors to new genes.