Fine-tuned protein language model identifies antigen-specific B cell receptors from immune repertoires

精细调整的蛋白质语言模型可从免疫库中识别抗原特异性B细胞受体

阅读:1

Abstract

Scalable identification of antigen-specific antibodies from whole immune repertoire V(D)J sequences is a central challenge in biomedical engineering. We show that protein language models (PLMs) fine-tuned on antibody heavy-chain sequences can directly predict antigen specificity from unselected immune repertoires. We assessed our model, Antigen Specificity Predictor (ASPred), against SARS-CoV-2, influenza, and HIV-AIDS antigens, observing comparable predictive performance. In the whole immune repertoire V(D)J sequences of mice immunized with the SARS-CoV-2 spike protein's receptor-binding domain (RBD), ASPred identified antibody sequences specific to RBD. Several candidate sequences were validated, including one as a heavy chain-only nanobody with 20.7 nM dissociation constant. Molecular dynamics simulations supported the predicted interactions at coarse-grained and atomic levels. Benchmarking against Barcode-Enabled Antigen Mapping (BEAM) of B cell receptor sequence data had highly significant overlaps with ASPred predictions, suggesting scalability. The predicted SARS-CoV-2 binders differed substantially from training sequences, demonstrating generalization beyond sequence memorization. Together, we establish that heavy chain antibody sequences encode sufficient information for PLMs to infer specificity, offering a scalable framework for antibody discovery with broad applications.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。