Abstract
MOTIVATION: Many pathogen identification and microbiome analysis tools have been developed in recent years, with Kraken 2 being one of the most popular. While tools downstream of Kraken 2 can assist in the interpretation of its outputs, a statistical framework to assess the likelihood that a taxon/organism is present in a single sample alongside an automated end-to-end analysis pipeline has not yet been fully implemented. RESULTS: Here, we introduce SPARKI, an R package that performs statistical analysis of Kraken 2 outputs and aids in the identification of pathogens present in next-generation sequencing samples. SPARKI adds to the field by bringing a probabilistic view to Kraken 2 data, serving as a discovery tool and complementing other methods such as KrakenTools, Bracken, and Pavian. AVAILABILITY AND IMPLEMENTATION: SPARKI code is available on GitHub at https://github.com/team113sanger/sparki. SPARKI is also part of an end-to-end pathogen identification pipeline, sparki-nf, which is available at https://github.com/team113sanger/sparki-nf. An additional pipeline for further exploration and validation of SPARKI results is also available at https://github.com/team113sanger/map-to-genome.