Abstract
DNA-encoded libraries (DELs) are powerful tools for initial hit identification, yet the combinatorial chemistries and building block choices used in their construction can restrict chemical space coverage and hit drug-likeness, limiting efficient hit expansion. Generative artificial intelligence (AI), by contrast, can in principle explore drug-like chemical space around any given compound, but it often struggles with the synthesizability of generated molecules and requires a set of validated hits to initiate exploration. Here, we present a synergistic methodology that overcomes these mutual limitations by leveraging experimentally validated DEL data to initialize and bias an AI-powered virtual screening pipeline, expanding initial DEL hits with both de novo and purchasable compounds from ultra-large chemical libraries. Using this approach, we identified novel, commercially available hits from the Enamine REAL Space for the chromatin reader protein 53BP1 and validated them in a time-resolved fluorescence resonance energy transfer (TR-FRET) displacement assay. Three compounds demonstrated TR-FRET IC50 values ≤50 μM, while 11 exhibited IC50 values ≤100 μM. Critically, the AI-nominated hits exhibited greater chemical diversity, improved drug-likeness, and were readily purchasable off-the-shelf compared to compounds from the initial DEL selection. This work demonstrates a streamlined platform in which empirical DEL data and generative chemistry models are combined to enable rapid hit expansion from initially screened libraries into diverse, commercially available chemical matter.