Toward whole-genome inference of polygenic scores with fast and memory-efficient algorithms

利用快速高效的算法进行全基因组多基因评分推断

阅读:1

Abstract

With improved whole-genome sequencing and variant imputation techniques, modern genome-wide association studies (GWASs) have enriched our understanding of the landscape of genetic associations for thousands of disease phenotypes. However, translating the marginal associations for millions of genetic variants to integrated polygenic risk scores (PRSs) that capture their joint effects on the phenotype remains a major challenge. Due to technical and statistical constraints, commonly used PRS methods in this setting either perform heuristic pruning and thresholding or overlook most genetic association signals by restricting inference to small variant sets, such as HapMap3. Here, we present a set of algorithmic improvements and compact data structures that enable scaling summary-statistics-based PRS inference to tens of millions of variants while avoiding numerical instabilities common in such high-dimensional settings. These enhancements consist of highly compressed linkage-disequilibrium (LD) matrix format, which integrates with streamlined and parallel coordinate-ascent updating schemes. When incorporated into our existing PRS method (VIPRS), the proposed algorithms yield over 50-fold reductions in storage requirements and lead to orders-of-magnitude improvements in runtime and memory efficiency. The updated VIPRS software can now perform variational Bayesian regression over 1.1 million HapMap3 variants in under a minute. Using this scalable implementation, we applied VIPRS to 75 of the most heritable, continuous phenotypes in the UK Biobank, leveraging marginal associations for up to 18 million bi-allelic variants. These experiments demonstrated that VIPRS is 1-2 orders of magnitude more efficient than popular baselines while being competitive with the best-performing methods in terms of prediction accuracy.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。