Abstract
Nucleus-forming phages are a class of jumbo phages with the capacity to form a protein nuclear shell during infection within a bacterial host. This protein nuclear shell protects the replication of the phage genome by excluding host defense factors. However, the lack of genetic characterization of these phages limits our understanding of nucleus-forming phages. Here, we used HMMER and RoseTTAFold to identify a proposed dataset of nucleus-forming phages, based on the presence of chimallin and PhuZ tubulin genes, containing the 1103 phage genomes (including 406 high quality genomes) from 16 million published phage genomes. These high quality phages genomes range in length were from 200 to 324 kb. A cluster analysis conducted using vConTACT2 revealed that these phages could be classified into 21 distinct virus clusters. A phylogenetic analysis demonstrated that the clades of these phages are independent of other jumbo phages. We identified distribution of these phages across earth's ecosystems. It is important to note that these phages are present in human oral samples. Further annotation of these genomes revealed that these phages encoded genes for DNA replication, DNA repair, and multiple anti-defense systems, suggesting that these phages possess unique adaptations that enable them to thrive in their respective environments. In conclusion, this study explored the diversity, distribution, and evolutionary characteristics of chimallin and PhuZ encoding phages in detail, establishing a foundation for further research on the possible regulatory functions of nucleus-forming phages in ecosystems and their effects on human health.