Abstract
BACKGROUND: Multi-locus sequence typing (MLST) is a typing method to differentiate bacteria based on the sequence of several housekeeping genes. Identifiers are assigned to unique allele sequences for each locus in the scheme, and the combination of these identifiers defines a genetic profile. Its reproducibility and portability across laboratories have made it a staple typing method, widely used for epidemiological and evolutionary analyses. Although MLST was traditionally based on Sanger sequencing, the method is still widely used in the era of whole-genome sequencing (WGS). Moreover, WGS has made core-genome MLST (cgMLST) possible, which scale up MLST to hundreds or thousands of loci across the genome. Conventional and cgMLST schemes are publicly available on various platforms such as PubMLST.org, BIGSdb Institut Pasteur, cgMLST.org, and EnteroBase. However, the available software for (offline) cgMLST allele calling is often not flexible to accommodate schemes from diverse sources and/or lacks the computational scalability required to efficiently process larger schemes. RESULTS: In this study, we present Minimap2-inferred Sequence Typing (MiST), a rapid and flexible cgMLST allele caller that is low-resource intensive and can easily accommodate schemes from different sources. We benchmarked the tool against other available MLST and cgMLST calling software using WGS data from five different species and schemes collected from various sources. We demonstrate that MiST can accurately identify alleles while requiring substantially fewer computational resources than existing allele callers. CONCLUSION: MiST can help to make cgMLST analysis more accessible for integration into local bioinformatics workflows. MiST is available as an open-source Python package under the GPLv3 license at https://github.com/BioinformaticsPlatformWIV-ISP/MiST , and can be installed via Pip or Conda.