Abstract
Computational epitope prediction remains an unmet need for therapeutic antibody development. We present three complementary approaches for predicting epitope relationships from antibody amino acid sequences. First, we analyze ~18 million antibody pairs targeting ~250 protein families and establish that a threshold of >70% CDRH3 sequence identity among antibodies sharing both heavy and light chain V-genes reliably predicts overlapping-epitope antibody pairs. Next, we develop a supervised contrastive fine-tuning framework for antibody large language models which results in embeddings that better correlate with epitope information than those from pretrained models. Applying this contrastive learning approach to SARS-CoV-2 receptor binding domain antibodies, we achieve 82.7% balanced accuracy in distinguishing same-epitope versus different-epitope antibody pairs and demonstrate the ability to predict relative levels of structural overlap from learning on functional epitope bins (Spearman ρ = 0.25). Finally, we create AbLang-PDB, a generalized model for predicting overlapping-epitope antibodies for a broad range of protein families. AbLang-PDB achieves five-fold improvement in average precision for predicting overlapping-epitope antibody pairs compared to sequence-based methods, and effectively predicts the amount of epitope overlap among overlapping-epitope pairs (ρ = 0.81). In an antibody discovery campaign searching for overlapping-epitope antibodies to the HIV-1 broadly neutralizing antibody 8ANC195, 70% of computationally selected candidates demonstrated HIV-1 specificity, with 50% showing competitive binding with 8ANC195. Together, the computational models presented here provide powerful tools for epitope-targeted antibody discovery, while demonstrating the efficacy of contrastive learning for improving epitope-representation.
