Abstract
Explainable artificial intelligence approaches accelerate drug discovery by improving molecular representation learning, identifying key molecular structures, and rationalizing drug property prediction. However, developing end-to-end explainable models for structure-activity relationship modeling in target-specific compound property prediction remains challenging due to the limited availability of compound-protein interaction data for individual targets and the fact that small changes in chemical substituents or local structural motifs can lead to large differences in molecular properties. Thus, optimally leveraging structural and property information and identifying key moieties related to compound-protein affinity for specific targets is essential. We propose a framework implementing graph neural networks (GNNs) to leverage property and structure information from pairs of molecules with activity cliffs targeting specific proteins to predict compound-protein affinity (i.e., half-maximal inhibitory concentration, IC(50)) and explain property differences. To enhance model explainability, we trained GNNs with structure-aware loss functions using group lasso and sparse group lasso regularizations, which prune and highlight molecular subgraphs relevant to activity differences. We applied this framework to the activity cliff data of molecules targeting 6 tyrosine-protein kinases across Src, Abl, and Tec families, as well as anaplastic lymphoma kinase. Integrating common- and uncommon-node information with sparse group lasso improves molecular property prediction for specific protein targets, as evidenced by lower root mean square errors and higher Pearson's correlation coefficients. Applying regularizations also enhances feature attribution for GNNs by boosting graph-level global direction scores and improving atom-level coloring accuracy. These advances strengthen model interpretability in drug discovery pipelines, particularly in identifying critical molecular substructures in lead optimization.