Abstract
Combining bioactivity data of assays against the same target, which are obtained from different sources, was recently shown to lead to considerable noise for training data sets of machine learning (ML) models. In this Viewpoint, we address the profound impact originating from often overlooked changes to an assay protocol relating to the buffer composition and experimental setup. We cover two examples of protein targets that undergo conformational changes driven by extrinsic factors: enzymes as catalytically active proteins, and viral surface proteins as structural targets. We discuss strategies to tackle this challenge for the case of enzyme inhibitors/binders, the utility of models based on deep learning (DL), and current limitations of computational studies assessing protein-ligand interactions. In an interview with an expert in the field of large language models (LLMs) and agentic AI, we explore how the latest developments in these areas can be leveraged to support drug discovery efforts.