Abstract
Groundwater levels in large agricultural irrigation districts generally show strong spatiotemporal variability due to heterogeneous hydrological conditions, geological formations, and human activities. Such variability complicates groundwater management and underscores the need for high-resolution, efficient prediction of groundwater levels. To enhance computational efficiency without compromising the accuracy of MODFLOW, this study proposes a novel surrogate modeling framework, SRR-LSTM, for predicting groundwater levels at a 1-km grid scale. The core innovation of this framework lies in its grid clustering strategy. It couples K-means and LSTM to cluster grids with similar physical features, hydrological features, and groundwater level dynamics, thereby enhancing prediction accuracy. A case study in the Taobei Irrigation District, Northeast China, shows that SRR-LSTM achieves an approximately 80% improvement in computational efficiency compared with the physics-based model. Simultaneously, the proposed framework attains a Nash–Sutcliffe Efficiency (NSE) greater than 0.9 for 96% of the grids. This performance surpasses that of the three baseline schemes, which reach NSE values above 0.9 in 11% to 49% of the grids. Furthermore, SHAP is employed to reveal the spatial heterogeneity of input variable contributions and to quantify the combined effects of streamflow and human activities on groundwater dynamics. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1038/s41598-026-37618-4.