A closed-loop method for precise genome size estimation using HiFi reads

利用高保真读段进行精确基因组大小估计的闭环方法

阅读:1

Abstract

BACKGROUND: Super pangenomes, as complete genome sequencing at the genus level, have provided new insights into the speciation and evolution of functional genes. Genome size (GS) estimation is a critical first step. Although K-mer-based GS evaluators are applied extensively to guide genome assembly process and quality assessment, the results vary substantially with the tools and parameters used, presenting challenges for genus-level genome studies. RESULTS: Here, we investigated K-mer spectra from datasets of species with and without whole genome duplication, revealing that the trade-off in K-mer length amplified the signal of genomic characteristics related to repeat content or heterozygosity. Moreover, GS predictions were influenced by genomic heterozygosity and sequencing accuracy when different K-mer lengths were employed. In contrast, consistent GS predictions were obtained across all HiFi-based evaluations, demonstrating high accuracy of the derived limiting values from the regions of GS evaluation convergence during continuous variation of K. Unlike traditional methods that rely on single predictions, we introduced a closed-loop GS-estimating framework, that incorporates steady-value calculations, leveraging the continuity and accuracy of HiFi reads. Finally, we developed a high-performance pipeline, LVgs (https://github.com/xingjianfeng100/LVgs), by integrating FastK and GenomeScope 2.0. CONCLUSIONS: The robustness and applicability of LVgs for genus-level species was demonstrated through its application to various diploid and polyploidy species. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-025-12031-9.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。