Abstract
Accurate prediction of micro-pK(a) values is crucial for understanding and modulating the acidity and basicity of organic molecules, with applications in drug discovery, materials science, and environmental chemistry. This work introduces QupKake, a novel method that combines graph neural network models with semiempirical quantum mechanical (QM) features to achieve exceptional accuracy and generalization in micro-pK(a) prediction. QupKake outperforms state-of-the-art models on a variety of benchmark data sets, with root-mean-square errors between 0.5 and 0.8 pK(a) units on five external test sets. Feature importance analysis reveals the crucial role of QM features in both the reaction site enumeration and micro-pK(a) prediction models. QupKake represents a significant advancement in micro-pK(a) prediction, offering a powerful tool for various applications in chemistry and beyond.