raxtax: a k-mer-based non-Bayesian taxonomic classifier

raxtax:一种基于k-mer的非贝叶斯分类器

阅读:1

Abstract

MOTIVATION: Taxonomic classification in biodiversity studies is the process of assigning the anonymous sequences of a marker gene (barcode) or whole genomes (metagenomics) to a specific lineage using a reference database that contains named sequences in a known taxonomy. This classification is important for assessing the diversity of biological systems. Taxonomic classification faces two main challenges: first, accuracy is critical as errors can propagate to downstream analysis results; and second, the classification time requirements can limit study size and study design, in particular when considering the constantly growing reference databases. To address these two challenges, we introduce raxtax, an efficient, novel taxonomic classification tool for barcodes that uses common k-mers between all pairs of query and reference sequences. We also introduce two novel uncertainty scores which take into account the fundamental biases of reference databases. RESULTS: We validate raxtax on three widely-used empirical reference databases and show that it is 2.7-100 times faster than competing state-of-the-art tools on the largest database while being equally accurate. In particular, raxtax exhibits increasing speedups with growing query and reference sequence numbers compared to existing tools (for 100 000 and 1 000 000 query and reference sequences overall, it is 1.3 and 2.9 times faster, respectively), and therefore alleviates the taxonomic classification scalability challenge. AVAILABILITY AND IMPLEMENTATION: raxtax is available at https://github.com/noahares/raxtax under a CC-NC-BY-SA license. The scripts and summary metrics used in our analyses are available at https://github.com/noahares/raxtax_paper_scripts. The source code, sequence data, and summarized results of the analyses are available at https://doi.org/10.5281/zenodo.15057027.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。