Abstract
Adsorption prediction of organic components on biochar and resins is essential for advancing industrial and energy technologies. This study utilized a dataset of 1750 adsorption isotherms comprising adsorption data for 73 organic materials on 50 biochar samples and 30 polymer resins. Machine learning models were developed using eight input parameters, including five Abraham solvation descriptors, total pore volume (Vt), specific surface area (BET), and equilibrium concentration (logCe), with the output parameter being adsorption degree (logKd). The dataset was split into training (1225 data points), testing (262), and validation (263). Various machine learning methods were evaluated, including Linear Regression, Ridge Regression, Lasso Regression, Elastic Net, Support Vector Regression (SVR), k-Nearest Neighbors (KNN), Decision Trees, Random Forests, Gradient Boosting Machines, Artificial Neural Networks (ANNs), Convolutional Neural Networks (CNNs), Gaussian Processes, as well as ensemble algorithms such as XGBoost, LightGBM, and CatBoost. Among these, XGBoost achieved superior accuracy with an R² of 0.974 and a mean squared error (MSE) of 0.0343, followed by LightGBM (R²=0.964, MSE = 0.0484) and CatBoost (R²=0.984, MSE = 0.0212). Simpler models such as Linear Regression and Elastic Net showed lower performance, with R² values ranging from 0.678 to 0.875 and higher MSE values. Sensitivity and SHAP analyses identified equilibrium concentration and specific surface area as the most critical factors influencing adsorption. The findings underscore the effectiveness of machine learning methods, particularly XGBoost, LightGBM, and CatBoost, in forecasting adsorption levels with high precision while offering actionable insights into key variables driving adsorption mechanisms.