A Unified Dataset for Antibody and Nanobody Design Including Sequence, Structure, and Binding Affinity Data

包含序列、结构和结合亲和力数据的抗体和纳米抗体设计统一数据集

阅读:1

Abstract

The design and optimization of antibodies and nanobodies using deep generative models hold transformative potential for therapeutic and diagnostic applications, which are hindered by the fragmented and inconsistent nature of existing datasets. To address these limitations, we introduce the Antibody and Nanobody Design Dataset (ANDD), a unified dataset that integrates sequence, structure, antigen, and affinity data from 15 diverse sources. ANDD is a comprehensive resource comprising 48,683 antibody/nanobody sequences, with structural data for 24,941 entries, and antigen sequences for 12,575 entries. We further augmented the affinity data with 2,271 predicted affinity values using ANTIPASTI, a robust model for binding affinity prediction. Consequently, ANDD includes 9,557 affinity values, making it the largest dataset to date for antibody/nanobody and antigen pairs with affinity data. By addressing challenges of data fragmentation and inconsistency, ANDD provides a robust foundation for training deep generative models. With ANDD, the models can better model antibody/nanobody-antigen interactions, while design novel antibodies and nanobodies with improved specificity and efficacy, paving the way for development of targeted therapeutics.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。