PHScaffolding: a hypergraph clustering and dual-weight integration strategy for scaffolding with Pore-C reads

PHScaffolding:一种基于超图聚类和双权重整合的基于Pore-C reads的支架构建策略

阅读:1

Abstract

Genome assembly aims to construct chromosome-level genome sequences, with scaffolding serving as a critical step, the accuracy of which highly depends on the quality of the input data. Although both Hi-C and Pore-C technologies are used to study genomic 3D structures, Pore-C demonstrates irreplaceable advantages in high-precision assembly due to its ability to capture long-range information and provide multi-fragment interaction information. However, most current scaffolding methods primarily rely on Hi-C data, which is limited by the inherent constraints of the technology, resulting in deficiencies in assembly continuity and accuracy. We propose a scaffolding method based on Pore-C data, named PHScaffolding. This method constructs a hypergraph by leveraging alignment information from Pore-C reads to capture multi-way interactions among contigs. A dedicated weighting scheme for hyperedges is also introduced. Subsequently, PHScaffolding applies the Louvain algorithm to cluster the hypergraph, aiming to group contigs originating from the same chromosome. Finally, for contigs within each cluster, the method employs a novel strategy to orient and order them based on Pore-C read alignments, thereby generating chromosome-level scaffolds. Evaluations on HG002, GM12878, and Arabidopsis thaliana contig datasets demonstrate that PHScaffolding achieves strong performance and robustness in terms of NA50, NGA50, and misassembly rates. Comparative experiments show that it outperforms traditional Hi-C-based scaffolding methods. The source code of PHScaffolding is available at https://github.com/Suquana/PHScaffolding.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。