Abstract
This study investigates the effectiveness of inclined double cutoff walls installed beneath hydraulic structures by employing five machine learning models: Random Forest (RF), Adaptive Boosting (AdaBoost), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Categorical Boosting (CatBoost). A comprehensive dataset of 630 samples was gathered from previous studies, including key input variables such as the relative distance between the cutoff wall and the structure's apron width (L/B), the inclination angle ratio between downstream and upstream cutoffs (θ(2)/θ(1)), the depth ratio of downstream to upstream cutoff walls (d(2)/d(1)), and the relative downstream cutoff depth to the permeable layer depth (d(2)/D). Outputs considered were the relative uplift force (U/U(o)), the relative exit hydraulic gradient (i(R)/i(Ro)), and the relative seepage discharge per unit structure length (q/q(o)). The dataset was split with a 70:30 ratio for training and testing. Hyperparameter optimization was conducted using Bayesian Optimization (BO) coupled with five-fold cross-validation to enhance model performance. Results showed that the CatBoost model demonstrated superior performance over other models, consistently yielding high R(2) values, specifically surpassing 0.95, 0.93, and 0.97 for U/U(o), i(R)/i(Ro), and q/q(o), respectively, along with low RMSE scores below 0.022, 0.089, and 0.019 for the same variables. A feature importance analysis is conducted using SHapley Additive exPlanations (SHAP) and Partial Dependence Plot (PDP). The analysis revealed that L/B was the most influential predictor for U/U(o) and i(R)/i(Ro), while d(2)/D played a crucial role in determining q/q(o). Moreover, PDPs illustrated a positive linear relationship between L/B and U/U(o), a V-shaped impact of d(2)/d(1) on i(R)/i(Ro) and q/q(o), and complex nonlinear interactions for θ(2)/θ(1) across all target variables. Furthermore, an interactive Graphical User Interface (GUI) was developed, enabling engineers to efficiently predict output variables and apply model insights in practical scenarios.