Abstract
We propose free and low-computationally complex methods of 16S rRNA metabarcoding analysis, then optimized and validate their accuracy for wastewater bacterial surveillance. Three taxonomic analysis pipelines were augmented: NCBI BLAST subsampling, Kraken 2/Bracken and QIIME 2/DADA 2. Our optimization strategies for the high complexity of wastewater samples raised QIIME 2/DADA 2's sensitivity to species-level taxa by 240.5%, while they increased the species-level selectivity of Kraken 2/Bracken and NCBI BLAST subsampling by 18.7% and 79.1%, respectively. Optimization vastly lowered the read mapping error for BLAST subsampling and Kraken 2/Bracken, by 42.0% and 11.4%, respectively. Microbial community diversity estimates were also improved through our optimization strategies. Richness measurements for BLAST subsampling became 95.6% more accurate, while Kraken 2/Bracken and QIIME 2/DADA 2 improved by 2.2% and 37.8%. Shannon entropy estimates by BLAST subsampling increased in accuracy by 17.4%, while for Kraken 2/Bracken and QIIME 2/DADA 2 they increased by 19.7% and 41.4%. For beta diversity, Bray-Curtis dissimilarity estimates by QIIME 2/DADA 2 increased in accuracy by 8.5% and by Kraken 2/Bracken by 174.3%.