Machine learning-based prediction of human structural variation and characterization of associated sequence determinants

基于机器学习的人类结构变异预测及相关序列决定因素的表征

阅读:2

Abstract

Structural variants (SVs) represent a major source of genetic diversity and play key roles in human disease and evolution. Yet, the extent to which local sequence context shapes the likelihood of structural variant formation remains poorly quantified. Here, we develop machine learning models to predict the occurrence of SVs across the human genome and characterize genomic determinants associated with their formation. We developed both a sequence only-based convolutional neural network (CNN) model as well as a random forest approach integrating diverse genomic annotations. Both models achieve high predictive performance individually (>90% AUROC) which can be further improved in an ensemble. The predictive ability of these models demonstrates that SV-prone regions can be accurately inferred from sequence context. Model interpretability techniques reveal key genomic contributors to SVs, including effects of sequence motifs such as microhomology and non-canonical DNA structures, as well as the presence of SV hotspots. We find that different classes of SVs exhibit distinct sequence determinants, with transposable elements and inversions displaying particularly unique signatures. Moreover, predicted SV probability correlates with allele frequency and gene functional constraint, indicating the potential utility of the model for variant effect prediction. These findings demonstrate that machine learning models trained on local sequence features can identify unstable genomic regions and provide a framework for quantifying SV susceptibility and SV variant effects in personalized genomics.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。