Abstract
Identifying highly efficacious, broad-spectrum antibodies against fast-mutating viral variants remains a major challenge in therapeutic development. Here, we developed AbGen, a machine learning-assisted antibody generation pipeline powered by an antibody language model (AbLM), to accelerate antibody screening and re-design. AbLM, pretrained on protein domain sequences and fine-tuned on paired VH-VL sequences, enables the analysis and prediction of neutralization activity against viruses (specifically SARS-CoV-2 in this study), targeting both wild-type (through antigen interaction prediction [docking]) and emerging variants (through Gaussian process regression [Kriging]). Screening over 1300 RBD-binding IgG sequences from convalescent patients, AbGen efficiently prioritized candidates for experimental validation and/or redesign against wild-type, Delta, and Omicron variants, preventing viral infections in vitro and in vivo. AbLM outperformed other language models in predicting IgGs with low variant susceptibility. Our work advances artificial intelligence-based antibody discovery by synergizing data-driven language models and Kriging with physics-driven docking and design.
