Abstract
In this work, we present an open-access quantum-chemistry database of more than 14,500 API-like molecules and their degradation products, all optimized at the M06-2X/6-31G(d) compound model. The data set delivers a comprehensive suite of thermochemical and quantum descriptors─including Gibbs free energy, enthalpy, electronic energy, vibrational frequencies and Cartesian geometries─tailored for large-scale modeling. Leveraging these data, we trained and validated three machine-learning models (XGBoost, Random Forest and Multi-Layer Perceptron) to enable rapid, accurate prediction of Gibbs free energy and enthalpy. These models are bundled in ThermoPred, an open-source Python package that offers a scalable, computationally efficient alternative to traditional quantum-chemical calculations. All data sets, models and source code are freely available to support reproducibility and foster community-driven development.