Abstract
Proteolysis-targeting chimeras (PROTAC) are emerging and promising molecules for targeted protein degradation which have the potential to overcome critical bottlenecks in traditional small molecule drug development. However, the scarcity of publicly available data on molecular compound structures has significantly hindered computational drug discovery and AI-aided drug discovery/design (AIDD) in this field. Patents are an important but underutilized source of novel chemical structures in medicinal chemistry. In this study, we collected PROTAC patents published in 2013-2023 and the associated chemical structures disclosed therein. Through manual screening and expert curation, we identified 63,136 unique PROTAC compounds under 590 patent families, along with 252 targets. Additionally, we employed the ADMETlab 3.0 platform to predict 120 physicochemical properties for all compounds. The dataset is publicly available on the Figshare platform, and an online webserver ( http://protacpatentdb.com ) has also been established. Given the rapid growth of PROTAC patent literature, this dataset can be further expanded as new patents are continuously published.