hmmibd-rs: An enhanced hmmIBD implementation for parallelizable identity-by-descent detection from large-scale Plasmodium genomic data

hmmibd-rs:一种增强型的hmmIBD实现,用于从大规模疟原虫基因组数据中并行化进行同源性检测。

阅读:1

Abstract

BACKGROUND: Identity-by-descent (IBD), which describes recent genetic co-ancestry between pairs of genomes, is a fundamental concept in population genomics. It has been used to estimate genetic relatedness, detect selection signals, and understand population demography. The IBD detection method hmmIBD demonstrates high accuracy in inferring IBD segments between haploid genomes, including Plasmodium falciparum, and is widely used in malaria genomic surveillance. However, the current single-threaded implementation of hmmIBD does not utilize the full capacity of multi-processor computers, making it difficult to apply to large data sets, and does not accommodate non-uniform recombination rates across the genome. METHODS: We developed an enhanced implementation of hmmIBD in the Rust programming language, named hmmibd-rs, which leverages multi-threaded computing to parallelize IBD inference over genome pairs and which supports optional, user-defined recombination rate maps for more accurate IBD detection and filtration from genomes with non-uniform recombination. We further streamlined large-scale IBD detection by incorporating auxiliary built-in functionalities to preprocess input directly from the standard binary variant call format (BCF) and filter IBD output to reduce disk usage. RESULTS: Our new implementation significantly reduces IBD detection computation time nearly linearly with the increased number of CPU threads used; using 128 threads shortens IBD detection time from 5.2 days to 1.3 hours for 220 million pairs of simulated Plasmodium falciparum-like chromosomes, increasing computational speed by approximately 100x over the single-threaded hmmIBD algorithm. Incorporating non-uniform recombination rates in hmmibd-rs enhances the accuracy of IBD inference by mitigating the overestimation of IBD breakpoints in recombination cold spots and their underestimation in hot spots. It also improves IBD segment length filtration, reducing the false positive rate in recombination cold spots and the false negative rate in hot spots. When applied to empirical data sets, hmmibd-rs completes the detection of IBD from MalariaGEN Pf7 (n ≈ 10,000 monoclonal samples) within hours, enabling a single-day IBD analysis pipeline for large genomic data sets. CONCLUSION: hmmibd-rs builds upon, accelerates, and enhances hmmIBD for efficient and accurate IBD detection, serving as a crucial tool for advancing large-scale malaria genomic surveillance.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。