Abstract
This study investigates viral composition in wastewater through metagenomic analysis, evaluating the performance of four bioinformatic tools-Genome Detective, CZ.ID, INSaFLU-TELEVIR and Trimmomatic + Kraken2-on samples collected from four sites in each of two wastewater treatment plants (WWTPs) in Lisbon, Portugal in April 2019. From each site, we collected and processed separately three replicates and one pool of nucleic acids extracted from the replicates. A total of 32 samples were processed using sequence-independent single-primer amplification (SISPA) and sequenced on an Illumina MiSeq platform. Across the 128 sample-tool combinations, viral read counts varied widely, from 3 to 288,464. There was a lack of consistency between replicates and their pools in terms of viral abundance and diversity, revealing the heterogeneity of the wastewater matrix and the variability in sequencing effort. There was also a difference between software tools highlighting the impact of tool selection on community profiling. A positive correlation between crAssphage and human pathogens was found, supporting crAssphage as a proxy for public health surveillance. A custom Python pipeline automated viral identification report processing, taxonomic assignments and diversity calculations, streamlining analysis and ensuring reproducibility. These findings emphasize the importance of sequencing depth, software tool selection and standardized pipelines in advancing wastewater-based epidemiology.