Structure-Aware Compound-Protein Affinity Prediction via Graph Neural Networks with Group Lasso Regularization

基于图神经网络和组套索正则化的结构感知化合物-蛋白质亲和力预测

阅读:1

Abstract

Explainable artificial intelligence approaches accelerate drug discovery by improving molecular representation learning, identifying key molecular structures, and rationalizing drug property prediction. However, developing end-to-end explainable models for structure-activity relationship modeling in target-specific compound property prediction remains challenging due to the limited availability of compound-protein interaction data for individual targets and the fact that small changes in chemical substituents or local structural motifs can lead to large differences in molecular properties. Thus, optimally leveraging structural and property information and identifying key moieties related to compound-protein affinity for specific targets is essential. We propose a framework implementing graph neural networks (GNNs) to leverage property and structure information from pairs of molecules with activity cliffs targeting specific proteins to predict compound-protein affinity (i.e., half-maximal inhibitory concentration, IC(50)) and explain property differences. To enhance model explainability, we trained GNNs with structure-aware loss functions using group lasso and sparse group lasso regularizations, which prune and highlight molecular subgraphs relevant to activity differences. We applied this framework to the activity cliff data of molecules targeting 6 tyrosine-protein kinases across Src, Abl, and Tec families, as well as anaplastic lymphoma kinase. Integrating common- and uncommon-node information with sparse group lasso improves molecular property prediction for specific protein targets, as evidenced by lower root mean square errors and higher Pearson's correlation coefficients. Applying regularizations also enhances feature attribution for GNNs by boosting graph-level global direction scores and improving atom-level coloring accuracy. These advances strengthen model interpretability in drug discovery pipelines, particularly in identifying critical molecular substructures in lead optimization.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。