Abstract
BACKGROUND: Viruses are fundamental to many aspects of life influencing ecosystem functions. The `number of lenses´ we use for exploring the viral diversity has expanded, yet each has limitations that constrain our view of the uncultured virosphere. It is fundamental to evaluate the different viromic approaches and sequencing methods on their ability to recover the extant viral diversity and microdiversity present in a sample. The differences in genome recovery between technologies have downstream impacts on subsequent estimates of viral diversity and function within a sample that can limit our comprehension of natural viral assemblages and their interactions with their microbial hosts. RESULTS: Here, using the same surface seawater sample, we compare short- and long-read viromics (i.e., Illumina, PacBio-HiFi and MinION sequencing) along with high-throughput single-virus genomics (sequencing of 700 uncultured single-viruses) to explore the consensus between approaches to uncover the extant viral diversity (sequencing effort ≈1.6 Tbp). Overall, ≈42,000 viral contigs (> 10 kb) were obtained, resulting in ≈12,500 and ≈23,400 viral OTUs at the genus and species levels, respectively, infecting mostly Flavobacteriaceae and Pelagibacteracea. At the viral family level, single-virus genomics (SVG) recovered viruses with a more distinct taxonomic profile compared to other methods. At lower taxonomic resolution, only < 1% of all species and genera, including some of the most abundant viruses, were captured by all methods; reaching a value of ≈2% when only viromics excluding SVG were considered. The highest pairwise diversity consensus was observed between PacBio-HiFi and Illumina, with approximately ≈11% of PacBio-HiFi species-level vOTUs also detected by Illumina. To understand how different methods resolve the co-occurring genomic microdiversity within species, we used one of the most abundant and microdiverse viruses -the uncultured pelagiphage vSAG 37-F6, proposed to be classified as Pelagimarinivirus ubique- originally discovered by single-virus genomics, as a reference. None of the methods alone were able to assemble the complete genome, which was only achieved by combining all datasets. Similarly, none of the viral clusters at the strain level were recovered by all methods. CONCLUSIONS: Our study suggests that the inherent bias of each method still represents a challenge for the recovery of marine viral diversity and potentially for other environmental viral samples. Nevertheless, regarding standard viromic techniques, PacBio HiFi in combination with Illumina seem to perform the best in absolute recovery of viral species and genera.