Leveraging protein language models to identify complex trait associations with previously inaccessible classes of functional rare variants

利用蛋白质语言模型识别复杂性状与先前无法获取的功能性罕见变异类别之间的关联

阅读:1

Abstract

Protein language models (PLMs) improve variant effect predictions, but their role in gene discovery for complex traits remains unclear. We introduce an allelic series-based regression test that uses PLM-derived variant effect predictions as proxies for effect sizes, identifying ∼46% more associations than standard burden tests. Extending this to isoform-level analysis, we find 26 gene-trait pairs with stronger associations in non-canonical versus canonical transcripts, highlighting isoform-specific effects. Finally, we identify evolutionary plausible variants (EPVs), missense variants assigned higher likelihoods than the wild-type alleles by PLMs, representing 0.45% of missense variants. EPVs show higher allele frequencies than synonymous variants, consistent with differential selection pressures, and are linked to nine traits, including protective associations with low-density lipoprotein (LDL) and bone mineral density. Together, our results demonstrate how PLMs can enhance rare-variant interpretation and gene-trait association discovery in exome data.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。