Abstract
Fragment-Based Drug Discovery (FBDD) is a powerful strategy with a proven track record of generating potent bioactive small molecules from low-affinity chemical fragments. Computational approaches to FBDD are often limited by the availability of high-quality, structurally resolved data on fragment binding poses. To address this gap, we introduce the Structurally Augmented Fragment Repository (SAFR), a novel data set designed to support in silico FBDD. Initially, a set of 89,375 high-confident binding poses of bioactive molecules in public sources was obtained by applying a filtering protocol involving 2D ligand similarity and 3D ligand superposition against protein-bound ligand structures followed by scoring with protein-ligand docking and interaction features. Fragmentation of the bioactive ligands in their predicted binding poses resulted in a total of 818,385 fragment-protein interactions between 157,080 unique chemical fragments and environments from 1,142 distinct proteins. Of them, 270,155 are unique fragment-protein interactions, of which 237,284 (88%) are not represented in protein-bound ligands in the PDB. Case studies using SAFR for bioisosteric replacements and scaffold hopping are presented. SAFR is a useful resource to support fragment screening campaigns and hit-to-lead optimization. It is publicly available at https://zenodo.org/records/18229523.