Abstract
BACKGROUND: Virus mutants are commonly used for studying the role of individual viral proteins in infections and are increasingly investigated with functional genomics experiments of infected cells that use sequencing-based assays such as RNA-seq or ATAC-seq. However, existing mutant virus strains are often poorly documented, in particular if they have been created decades ago. Identifying viral variants directly in the functional genomics experiments avoids additional genome sequencing and allows confirming the presence of specific mutations directly in the experiment of interest. METHODS: We present a pipeline to directly identify mutations in viral genomes from sequencing-based functional genomics data. The pipeline combines existing SNP callers with novel methods for identifying deletions, insertions, and corresponding inserted sequences. These novel methods address the problem that existing structural variant callers performed poorly on functional genomics data with large variations in read coverage. RESULTS: We evaluated the pipeline on RNA-seq data for infection with knockout mutants for important proteins of Herpes simplex virus 1 (HSV-1). Comparison of the variants identified by our pipeline with the descriptions of the original publications showed that we could correctly recover the introduced mutations. CONCLUSIONS: Our pipeline offers researchers a fast and easy way to identify variants in the viral genome without additional genome sequencing. The pipeline is implemented as a workflow for the workflow management system Watchdog and is available at https://github.com/watchdog-wms/watchdog-wms-workflows/ (workflow VariantCallerPipeline).