Abstract
Bacteria play fundamental roles in ecosystems, human health, and biotechnology. Although bacterial genome sequencing data have accumulated rapidly over the past decade, the metabolic and ecological functions carried out by most sequenced bacteria remain poorly understood, apart from a few well-studied taxa and traits. Establishing a general framework that comprehensively captures the relationship between bacterial genomes and the diverse biological functions they encode remains a major challenge, as it requires embedding individual genes within their broader genomic context and modeling their combined effects across complex biological pathways and networks. The difficulty is further compounded by the limited functional annotations available for most bacterial genomes. Here, we introduce BacPT, a proteome foundation model trained on tens of thousands of complete genomes spanning diverse bacterial taxa. BacPT captures both local and genome-wide information, enabling the generation of contextualized gene embeddings and functionally rich representations of the whole genome. We demonstrate the utility of BacPT across diverse prediction tasks spanning multiple biological scales. BacPT embeddings improve the prediction of enzyme activities, biosynthetic gene clusters, metabolic traits, and ecological interaction outcomes. Our results highlight that unsupervised deep learning applied at the scale of entire proteomes provides a powerful approach for characterizing gene interactions and mapping functional landscapes for bacteria.