Kun-peng enables scalable and accurate pan-domain metagenomic classification

昆鹏实现了可扩展且精确的泛域宏基因组分类

阅读:1

Abstract

Comprehensive pan-domain metagenomic classification is increasingly constrained by the memory and runtime costs of building and querying the rapidly expanding reference genome space. We introduce Kun-peng, a taxonomic classifier powered by an intelligent block-partitioned database structure and optimized search strategies, enabling ultra-scalable, memory-efficient pan-domain profiling. Using the Critical Assessment of Metagenome Interpretation II benchmark, Kun-peng substantially reduces the memory usage of database-building and querying by up to 24-fold, and accelerates sample classification by up to 4.73-fold compared with Kraken2. Kun-peng achieves competitive accuracy with fewer false positives than Kraken2, Centrifuger, and even KrakenUniq, while maintaining consistently high sensitivity across diverse datasets. In a real-world evaluation of 586 metagenomic samples spanning air, water, soil, and human-associated environments, we performed classification using a 4.3 TB pan-domain database comprising 204,477 genomes, which was built by Kun-peng with only 4.1 GB peak memory. Kun-peng processed each sample in 0.2-11.2 min with 4.0-35.4 GB peak memory, corresponding to a 54-473-fold reduction in memory usage relative to Kraken2. Compared with Sylph, Kun-peng achieved up to a 46-fold speedup while requiring 21-fold less memory. Kun-peng classified 69.8%-94.3% of reads, improving coverage by 20%-60% over the standard Kraken2 database with 62,026 genomes. This improvement reflects expanded reference coverage, although a small fraction of false positives is inherent to k-mer-based methods. Overall, Kun-peng effectively eliminates the long-standing memory bottleneck in pan-domain database building and classification, enabling rapid and scalable pan-domain taxonomic analysis of complex environmental, ecological, and exposomic sequencing datasets.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。