Fast and Scalable Parallel External-Memory Construction of Colored Compacted de Bruijn Graphs with Cuttlefish 3

利用 Cuttlefish 3 快速、可扩展地并行构建彩色紧凑型 de Bruijn 图的外部存储器

阅读:1

Abstract

The rapid growth of genomic data over the past decade has made scalable and efficient sequence analysis algorithms, particularly for constructing de Bruijn graphs and their colored and compacted variants critical components of many bioinformatics pipelines. Colored compacted de Bruijn graphs condense repetitive sequence information, significantly reducing the data burden on downstream analyses like assembly, indexing, and pan-genomics. However, direct construction of these graphs is necessary as constructing the original uncompacted graph is essentially infeasible at large scale. In this paper, we introduce Cuttlefish 3, a state-of-the-art parallel, external-memory algorithm for constructing (colored) compacted de Bruijn graphs. Cuttlefish 3 introduces novel algorithmic improvements that provide its scalability and speed, including optimizations to significantly speed up local contractions within subgraphs, a parallel algorithm to join local solutions based on parallel list-ranking, and a sparsification method to vastly reduce the amount of data required to compute the colored graph. Leveraging these algorithmic strategies along with algorithm engineering optimizations in parallel and external-memory setting, Cuttlefish 3 demonstrates state-of-the-art performance, surpassing existing approaches in speed and scalability across various genomic datasets in both colored and uncolored scenarios.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。