Learning a refinement model for variant analysis in non-human primate genomes

学习一种用于非人灵长类动物基因组变异分析的精细化模型

阅读:3

Abstract

BACKGROUND: Accurate variant calling is essential for genomic studies but is highly dependent on sequence alignment (SA) quality. In non-human primates, the lack of well-curated variant resources limits alignment postprocessing, leading to suboptimal SA and increased miscalls. DeepVariant, a leading variant caller, demonstrates high accuracy in human genomes but exhibits performance degradation under suboptimal SA conditions. RESULTS: To address this, we developed a decision tree-based refinement model that integrates alignment quality metrics and DeepVariant confidence scores to filter miscalls effectively. We defined suboptimal SA and optimal SA based on the presence or absence of postprocessing steps and confirmed that suboptimal SA significantly increases miscalls in both human and rhesus macaque genomes. Applying the refinement model to human suboptimal SA reduced the miscalling ratio (MR) by 52.54%, demonstrating its effectiveness. When applied to rhesus macaque genomes, the model achieved a 76.20% MR reduction, showing its potential for non-human primate studies. Alternative base ratio (ABR) analysis further revealed that the model refines homozygous SNVs more effectively than heterozygous SNVs, improving variant classification reliability. CONCLUSIONS: Our refinement model significantly improves variant calling in suboptimal SA conditions, which is particularly beneficial for non-human primate studies where alignment postprocessing is often limited. We packaged our model into the Genome Variant Refinement Pipeline (GVRP), providing for researchers working on population genetics and molecular evolution. This work establishes a framework for enhancing variant calling accuracy in species with limited genomic resources. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-025-11921-2.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。