Abstract
Given the challenges in enhanced oil recovery (EOR) from carbonate reservoirs, this study investigates the application of machine learning models to predict contact angles under Smart Water Assisted Foam (SWAF) injection. SWAF, as an innovative and environmentally friendly approach, enhances sweep efficiency by altering reservoir rock wettability and controlling gas mobility. The primary objective is to develop accurate models for predicting contact angles based on input features such as ion concentrations (NaHCO₃, NaCl, CaCl₂, MgCl₂, KCl, Na₂SO₄), pressure, temperature, and contact time. The initial dataset comprised 1,615 samples from validated sources, which were expanded to 13,032 samples using data augmentation techniques including Gaussian noise, scaling transformations, and physically constrained Generative Adversarial Networks (GANs). The models examined include Convolutional Neural Networks (CNN), XGBoost, Support Vector Regression (SVR), Random Forest, and Multilayer Perceptron (MLP). Data were split into training, validation, and test sets in a 60:20:20 ratio, and hyperparameters were optimized using grid search combined with 5-fold cross-validation. Results indicate that XGBoost achieved the best performance (RMSE = 0.09°; R² = 0.99997), followed by Random Forest (RMSE = 0.13°; R² = 0.99994), SVR (RMSE = 0.32°; R² = 0.99963), MLP (RMSE = 0.39°; R² = 0.99944), and CNN (RMSE = 1.48°; R² = 0.997). While CNN serves as a novel deep learning benchmark for exploring complex pattern recognition in tabular data, tree-based models like XGBoost outperform it for this tabular SWAF dataset, highlighting the value of ensemble methods. Feature importance analysis revealed the dominant influence of MgCl₂ (28%) and Na₂SO₄ (22%), whereas pressure had the minimal effect (~ 3%). Temperature exhibited an inverse relationship with contact angle. These models provide powerful tools for optimizing SWAF processes and reducing the need for costly experimental work. Limitations include the exclusion of crude oil properties and field-scale data. Future work is recommended to extend these models with dynamic and field-scale datasets to enhance practical applicability in the oil industry.