PROTAC-PatentDB: A PROTAC Patent Compound Dataset

PROTAC-PatentDB:PROTAC专利化合物数据集

阅读:2

Abstract

Proteolysis-targeting chimeras (PROTAC) are emerging and promising molecules for targeted protein degradation which have the potential to overcome critical bottlenecks in traditional small molecule drug development. However, the scarcity of publicly available data on molecular compound structures has significantly hindered computational drug discovery and AI-aided drug discovery/design (AIDD) in this field. Patents are an important but underutilized source of novel chemical structures in medicinal chemistry. In this study, we collected PROTAC patents published in 2013-2023 and the associated chemical structures disclosed therein. Through manual screening and expert curation, we identified 63,136 unique PROTAC compounds under 590 patent families, along with 252 targets. Additionally, we employed the ADMETlab 3.0 platform to predict 120 physicochemical properties for all compounds. The dataset is publicly available on the Figshare platform, and an online webserver ( http://protacpatentdb.com ) has also been established. Given the rapid growth of PROTAC patent literature, this dataset can be further expanded as new patents are continuously published.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。