Enhancing CYP3A4 Inhibition Prediction Using a Hybrid GNN-ML Model with Data Augmentation

利用数据增强的混合 GNN-ML 模型提高 CYP3A4 抑制预测的准确性

阅读:1

Abstract

Background/Objectives: Cytochrome P450 3A4 (CYP3A4) metabolizes approximately 30-50% of clinically used drugs; thus, accurate prediction of CYP3A4 inhibition is essential for early assessment of drug-drug interaction (DDI) risk and toxicity. This study evaluated an integrated artificial intelligence framework for predicting CYP3A4 inhibition (%) using a large, curated chemical dataset. Methods: A dataset of 23,713 compounds was compiled from the Korea Chemical Bank and multiple commercial and public databases. Vector-based machine learning (ML) models (LightGBM, XGBoost, CatBoost, and a weighted ML ensemble) and graph neural network (GNN) models (O-GNN with contrastive learning and manifold mixup (O-GNN + CL + Mixup), D-MPNN, GINE, and GATv2) were evaluated. Manifold mixup was applied during GNN training, and SMILES enumeration-based test-time augmentation was used at inference. The best-performing ML and GNN models were integrated using a weighted ensemble strategy. Model interpretability was examined using SHAP analysis for ML models and occlusion sensitivity analysis for O-GNN + CL + Mixup. Results: The weighted ML ensemble achieved the best performance among ML models (RMSE = 19.1031, Pearson correlation coefficient (PCC) = 0.7566); the O-GNN + CL + Mixup model performed the best among GNN models (RMSE = 20.1002, PCC = 0.7265). The hybrid model achieved improved predictive accuracy (RMSE = 19.0784, PCC = 0.7570). External validation on 100 newly generated experimental data points confirmed generalizability (Custom Metric = 0.8035). Conclusions: This study demonstrated that integrating ML and GNN models with data augmentation strategies improves the robustness and interpretability of CYP3A4 inhibition prediction and established a practical framework for metabolic screening and DDI risk assessment.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。