Graph-Based Deep Learning Models for Predicting pK(a) Values of Protein-Ionizable Residues via Physically Inspired Feature Engineering

基于图的深度学习模型通过物理启发式特征工程预测蛋白质可电离残基的pK(a)值

阅读：1

作者：Song,Ziyu,Wang,Ruixuan,Jiao,Xun,Huang,Zuyi

期刊：	Journal of Chemical Information and Modeling	影响因子：	5.300
时间：	2026	起止号：	2026 Feb 9;66(3):1742-1756
doi：	10.1021/acs.jcim.5c01681

Abstract

The pK(a) value of a protein-ionizable residue reflects its potency to donate a proton at a given pH value, which is essential for understanding a wide range of biological activity. Therefore, the accurate prediction of pK(a) values of protein residues is crucial for understanding enzymatic activity and protein-ligand binding, which are fundamental to drug discovery. Despite significant time and resources being invested to develop computational methods for protein residue pK(a) prediction, the accuracy of existing tools, such as the widely used PROPKA, remains limited. In this study, an integrated framework that fuses molecular dynamics simulations and deep learning models is proposed to improve the predictive accuracy of pK(a) values for ionizable residues. Specifically, we employ high-throughput molecular modeling using the AMOEBA polarized force field to construct a protein structure data set enriched with atomic electrostatics and other physics-inspired features. Using the experimentally determined pK(a) values from the PKAD-2 data set, we trained three graph-based neural network models. All three models demonstrated substantial improvements in prediction accuracy across four ionizable residue types, aspartic acid, glutamic acid, lysine, and histidine, when compared to PROPKA3.5.1, with the graph attention networks-based model exhibiting both high accuracy and strong generalizability when benchmarking against several recently published machine learning models. Beyond these improvements in predictive performance, feature importance analysis of the best-performing models revealed physically meaningful patterns of the descriptive features that aligned with the underlying biophysical principles governing protein residue pK(a) values, most notably, the complexity of the local microenvironment and the atomic geometric arrangement within the protein structure. Together, the trained pK(a) models and the curated dipole moment-enhanced data set based on a polarizable FF offer a valuable resource for the research community, with potential applications in early-stage drug target identification and protein engineering.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。

肿瘤免疫

炎症

T细胞

凋亡

线粒体

转录调控

巨噬细胞

自噬

传染病

氧化应激

肠道菌群

血管生成

磷酸化

囊泡

单细胞

3D/类器官

中性粒细胞

外泌体

药物研究

DNA甲基化

细胞衰老

miRNA

铁死亡

乙酰化

缺氧低氧

泛素化

组蛋白修饰

炎性小体

树突状细胞

肿瘤微环境

代谢重编程

焦亡

lncRNA

m6A/m5C/m7G

空间多组学

细胞基因治疗

内质网应激

相分离

治疗耐药

Treg

免疫代谢

上皮间质转化

染色质重塑

脂质过氧化

蛋白质稳态

铁代谢

脂代谢

肠脑轴

cGAS-STING

乳酸化

氨基酸代谢

细胞极性

碱基编辑

蛋白降解

circRNA

翻译调控

肿瘤异质性

piRNA

低氧缺氧

NK 细胞

氧化脂质

MDSC

NETosis

溶酶体功能

RNA 编辑

细胞干性

琥珀酰化

CAR-NK

冷应激

Tfh

器官芯片

巴豆酰化

表观遗传记忆

空间代谢组

铜死亡

器官纤维化

线粒体未折叠蛋白反应

自噬流

程序性坏死

肠肝轴

MAIT 细胞

丙酰化