Abstract
The capabilities of machine learning approaches were investigated to predict the hydrogen bond energy based on partial charges, bond orders, bond distances, and element types. Support vector regression in combination with gradient boosting resulted in a mean absolute percentage error of 3%, which is a significant improvement compared to previous models. The best models include Löwdin partial charges and bond orders from BLYP or B3LYP with the def2-SVP double-ζ basis set. Furthermore, the semiempirical GFN2-xTB approach can be employed to calculate Mulliken partial charges and Wiberg bond orders as input features of a regression model, resulting in a mean absolute percentage error of 4%. All models were fitted on coupled cluster energies with singles, doubles, and perturbative triples extrapolated to the complete basis set limit.