Abstract
Adsorption energy is a fundamental property in catalysis and chemical reaction studies; however, conventional quantum chemistry methods, such as density functional theory, provide high accuracy but are often computationally expensive or even impractical for screening large data sets or complex chemical systems. In this work, we proposed a machine learning (ML) pipeline that efficiently predicts relative energy interactions for molecular adsorption near the minimum molecule-cluster distance, at a fraction of the computational cost of quantum chemistry-based methods. Our approach begins by transforming the Fritz-Haber Institute ab initio materials simulation (FHI-aims) output data into feature arrays through a modified version of the Smooth Overlap of Atomic Positions (SOAP) descriptor, which we call Cut-SOAP. This modification reduces the dimensionality of the features by more than 97% while preserving most of the inherent quality of the data. With this method, we construct a large adsorption data set of more than 430,000 entries using real-world data. Then, a deep neural network was trained on this data set, analyzing the influence of architectural and hyperparameter choices on both computational cost and predictive accuracy. The model achieved a mean absolute error below 0.1 eV in the standard test set. To rigorously assess its generalization for real-world applications, we evaluated it on a challenging out-of-distribution data set, where it maintained a robust mean absolute error below 1.0 eV. The trained model is capable of making thousands of predictions in seconds, demonstrating the effectiveness of the pipeline for rapid screening. These results highlight the benefits of ML-based approaches for material screening, which offer accessible, efficient, and accurate tools for predicting relative energy interactions. This capability is a crucial step toward the accelerated discovery and optimization of catalytic systems.