Abstract
In this study, we extend a previously introduced QM-AI strategy for predicting halogen···π interaction energies from a single aromatic model (representing phenylalanine) to multiple biologically relevant aromatic environments. Herein, neural network models were developed for halogen···π interactions involving phenol, imidazole, and indole, serving as model systems for the aromatic side chain residues of tyrosine, histidine, and tryptophan. Large, systematically generated datasets of halobenzene-aromatic system complexes (in total, over 18 million interaction geometries) were evaluated at the MP2/TZVPP level of theory and represented by compact geometric descriptors to train residue-specific neural networks. Across all systems, the models reproduce quantum-mechanical reference energies with high accuracy (R(2) > 0.98 and RMSE < 0.5 kJ/mol) within the targeted σ-hole interaction domain and retain robust performance on independent, randomly generated geometry and PDB-derived test sets. Model limitations are primarily associated with geometric arrangements outside the training distribution, such as π···π, C-H···π, or other non-σ-hole interaction motifs. By augmenting the training data with additional randomly generated geometries, model robustness and generalization were further improved without modifying the underlying network architecture. Overall, this work establishes a scalable and transferable QM-AI strategy for the rapid and accurate prediction of halogen···π interaction energies across diverse aromatic environments, enabling near-quantum-mechanical accuracy at negligible computational cost and supporting future applications in structure-based drug design.