Abstract
The study explores how well machine learning and structural fingerprints can predict spectroscopic properties of ice (OH vibrational frequencies and (1)H chemical shifts). A large theoretical data set (55 ice polymorphs, 1010 DFT data points both for the vibrations and for the NMR shifts) and a smaller cross-validation set are employed. The Message Passing Atomic Cluster Expansion (MACE) model performs the best, with high accuracy (root-mean-square deviation, RMSD, of 0.06 ppm for chemical shifts and ∼10 cm(-1) for vibrational frequencies). Simpler descriptors like ACSF and SOAP, when paired with suitable regressors, nearly match MACE's performance. At the other end of the complexity scale, it is found that using the simplest possible physics-based descriptor of the environment (a single H-bond distance) yields RMSD values three times as large for the vibrations and four times as large for the proton chemical shift compared to the MACE model. Depending on the context, those RMSD values may still be considered modest and useful, considering the gain in simplicity and transparency.