Abstract
This study explores the use of machine learning (ML) techniques to predict Fourier-transform infrared (FTIR) intensities of products from the thermal cracking of Athabasca bitumen, aiming to develop a reliable soft-sensor. The ultimate goal is to obtain the FTIR spectra of the thermally cracked products online to reduce process time from slow physical measurements. Various ML models, including Linear Regression (LinR), partial least squares regression (PLSR), support vector regression (SVR), K-nearest neighbors (k-NN), random forest (RF), and gradient boosting regression (GBR), were implemented to enhance the predictive accuracy and efficiency of FTIR spectroscopy, aiming to reduce the need for traditional physical measurements which are often slow compared to the rapid predictions offered by ML techniques. To assess the model's generalization capabilities, with respect to model predictions, the models were trained and tested across four different scenarios with varying temperature data obtained from visbreaking experiments performed on Athabasca Bitumen at temperatures ranging from 25 to 420 °C with reaction times ranging from 15 min to 27 h. Scenario 1 included all 61,740 data points utilizing an 80/20 train-test split with 10-fold cross-validation (CV). Scenario 2 involved training on temperatures of 25, 350, and 400 °C and testing on 300, 380, and 420 °C. Scenario 3 involved training on temperatures of 350, 380, and 400 °C and testing on 25, 300, and 420 °C. Finally, Scenario 4 involved training on temperatures of 25, 300, 350, and 380 °C and testing on 400 and 420 °C. Bayesian optimization was employed for hyperparameter tuning to identify the optimal configurations for each model. The results indicate that ensemble methods, particularly GBR, consistently achieved the highest predictive accuracy (R (2)) and lowest root mean squared error (RMSE) across all scenarios. In Scenario 1, GBR achieved a prediction accuracy of 99.66%. Scenario 2 highlighted the models' ability to generalize across varying temperatures, with both RF and GBR achieving similar performance with high prediction accuracies of around 94%. Scenario 3, characterized by significant temperature variability, demonstrated the robustness of GBR, which outperformed RF and k-NN with a predictive accuracy of 92.15%. Scenario 4, focusing on high-temperature predictions from low-temperature training data, showed that GBR still performed robustly with a predictive accuracy of 80.40%. The study concludes that GBR models, particularly those with well-tuned hyperparameters, are highly effective in predicting FTIR intensities, outperforming other techniques like RF, k-NN, LinR, and PLSR. The integration of advanced ML techniques and Bayesian optimization significantly enhances the capability to predict FTIR spectra, providing a reliable soft-sensor as an alternative to traditional physical experimentation methods. This approach not only saves time and resources but also ensures consistent and high-quality predictive performance in chemical analysis and monitoring.