Abstract
BACKGROUND: Jumbo phages are phages with comparatively large genome sizes. Jumbo phages have been identified in various microbial communities. However, their diversity, genome structure, potential function, and their interactions with hosts and other phages are largely unknown due to insufficient genomic data. RESULTS: We collected 59,652,008 putative viral genomes from seven habitats by using 38 public metagenome datasets, an integrated public viral genome database (IGN), and pig gut viral genome databases. We obtained 10,754 jumbo phage genomes with sizes ranging from 200 to 831 kb. Most (94.64%) of these jumbo phage genomes were classified into Caudoviricetes, and the results have expanded the known diversity of Caudoviricetes. We found 2,389 species-like operational genome clusters that contained 3,727 (34.69%) genomes without any known viral genomes in the IGN, suggesting potential novel species-like genomes. Genome analysis suggested the potential coevolution of jumbo phages with habitat types and highlighted the utilization of alternative genetic codes and their corresponding suppressor tRNAs for recoding stop codons. CRISPR spacer analysis revealed potential bacterial or archaeal hosts of jumbo phages and uncovered competitive networks among jumbo phages. Habitat type had an important effect on the variation in phage auxiliary metabolic genes. CONCLUSIONS: This study provides an important resource and new knowledge for future studies on the interaction between jumbo phages and their bacterial or archaeal hosts. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s42523-026-00534-z.