S(2)ALM: Sequence-Structure Pre-trained Large Language Model for Comprehensive Antibody Representation Learning

S(2)ALM:用于全面抗体表征学习的序列结构预训练大型语言模型

阅读:1

Abstract

Antibodies safeguard our health through their precise and potent binding to specific antigens, demonstrating promising therapeutic efficacy in the treatment of numerous diseases, including COVID-19. Recent advancements in biomedical language models have shown the great potential to interpret complex biological structures and functions. However, existing antibody-specific models have a notable limitation that they lack explicit consideration for antibody structural information, despite the fact that both 1-dimensional sequence and 3-dimensional structure carry unique and complementary insights into antibody behavior and functionality. This paper proposes the Sequence-Structure multi-level pre-trained Antibody Language Model (S(2)ALM), combining holistic sequential and structural information in one unified, generic antibody foundation model. We construct a hierarchical pre-training paradigm incorporated with 2 customized multi-level training objectives to facilitate the modeling of comprehensive antibody representations. S(2)ALM's representation space uncovers inherent functional binding mechanisms, biological evolution properties, and structural interaction patterns. Pre-trained over 75 million sequences and 11.7 million structures, S(2)ALM can be adopted for diverse downstream tasks: accurately predicting antigen-antibody binding affinities, precisely distinguishing B cell maturation stages, identifying antibody crucial binding positions, and specifically designing novel coronavirus-binding antibodies. Remarkably, S(2)ALM outperforms well-established and renowned baselines and sets new state-of-the-art performance across extensive antibody-specific understanding and generation tasks. S(2)ALM's ability to model comprehensive and generalized representations further positions its potential to advance real-world therapeutic antibody development, potentially addressing unmet academic, industrial, and clinical needs.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。