Abstract
Viruses are ubiquitous across all kingdoms of cellular life, posing a significant threat to human health, and analyzing viral communities is challenging due to their genetic diversity and lack of a single, universally conserved marker gene. To address this challenge, we developed the AliMarko pipeline, a tool designed to streamline virus identification in metagenomic data. Our pipeline uses a dual approach, combining mapping reads with reference genomes and a de novo assembly-based approach involving an HMM-based homology search and phylogenetic analysis, to enable comprehensive detection of viral sequences, including low-coverage and divergent sequences. We applied our pipeline to total RNA sequencing of bat feces and identified a range of viruses, quickly validating viral sequences and assessing their phylogenetic relationships. We hope that the AliMarko pipeline will be a useful resource for the scientific community, facilitating the interpretation of viral communities and advancing our understanding of viral diversity and its impact on human health.