Abstract
This study centers on predicting the viscosity of ionic liquid systems utilizing advanced regression models and a dataset comprising 8,500 entries. The input variables include categorical features (Cation and Anion) which represent the structure of ionic liquid and numerical variables (Temperature, T, and xIL). The data underwent several preprocessing steps, including Leave-One-Out encoding for categorical variables, Isolation Forest for outlier removal, and Min-Max method for normalization. Four regression models were implemented: Spline Regression (SPR), Twin Support Vector Regression (TSVR), Adaptive Lasso (ALASSO), and Neural Oblivious Decision Ensembles (NODE). Hyperparameters were optimized using the Firefly Algorithm. The NODE model indicated the best fitting amongst others, offering the highest cross-validation R(2) of 0.99536 (±0.00124), training R(2) of 0.99728, and test R(2) of 0.99721, with the lowest test RMSE (0.0031499) and test MAE (0.0022219). The SPR model followed closely, with a cross-validation R(2) of 0.96940 (±0.00303), test RMSE of 0.01393, and test MAE of 0.003869. TSVR showed moderate performance with a cross-validation R(2) of 0.85577 and test RMSE of 0.01752, while ALASSO was the least effective, with a cross-validation R(2) of 0.78169 and test RMSE of 0.02507. This study highlights the importance of robust preprocessing and identifies the NODE model as the most accurate and reliable tool for predicting viscosity in complex ionic liquid datasets.