Abstract
Antibodies must bind their targets with high affinity and specificity to achieve useful therapeutic activity. They must also possess suitable developability properties (e.g. thermostability, solubility, viscosity, polyreactivity) to ensure favorable manufacturing, formulation, and in vivo performance. Both binding and developability properties are inherent to a given antibody amino acid sequence. Identification or selection of antibodies possessing suitable-binding characteristics is now routine, and de novo computational design models, trained on extensive complementarity-determining region sequence and structural data, are rapidly improving. Developability properties, however, remain difficult to predict largely due to insufficient training data, with empirical testing being heavily used to avoid challenges in late-stage antibody development. To fill this gap, we built a high-throughput antibody developability assay platform designed to generate the large datasets needed to train improved machine learning (ML) models. We optimized and automated known developability assays, and developed a robust integrated data analytics pipeline. Here, we report data on 246 antibodies - representing 106 approved, 135 clinical-stage, and 5 preregistration/withdrawn molecules - across a panel of 10 developability assays, in a "tidy data" format suitable for AI/ML modeling. We used these data to develop an XGBoost ML model that better predicts similarity to approved antibodies compared to conventional use of developability warning thresholds. Additionally, we confirm that preliminary predictive models do improve with more training data. Our high-throughput PROPHET-Ab platform enables data generation at the scale needed to develop improved ML models to predict antibody developability.