FastDup: a scalable duplicate marking tool using speculation-and-test mechanism

FastDup:一种使用推测和测试机制的可扩展重复标记工具

阅读:1

Abstract

SUMMARY: Duplicate marking is a critical preprocessing step in gene sequence analysis to flag redundant reads arising from polymerase chain reaction amplification and sequencing artifacts. Although Picard MarkDuplicates is widely recognized as the gold-standard tool, its single-threaded implementation and reliance on global sorting result in significant computational and resource overhead, limiting its efficiency on large-scale datasets. Here, we introduce FastDup: a high-performance, scalable solution that follows the speculation-and-test mechanism. FastDup achieves up to 20× throughput speedup with 32 threads and guarantees 100% identical output compared to Picard MarkDuplicates. AVAILABILITY AND IMPLEMENTATION: FastDup is a C++ program available from Zenodo https://zenodo.org/records/15727829, Bioconda https://anaconda.org/bioconda/fastdup and GitHub https://github.com/zzhofict/FastDup.git under the MIT license.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。