Abstract
Accurate monitoring and prediction of Chlorophyll-a (Chl-a) concentrations are critical for protecting desalination systems from algal blooms. This study presents an advanced framework employing the Graph Neural Network Transformer Model (GNN-T) to predict the spatiotemporal dynamics of Chl-a in semi-enclosed marine environments, such as the Persian Gulf, a key region for desalination facilities in the Middle East. The GNN-T model integrates critical environmental variables, utilizing MODIS/Aqua and ERA5 datasets with 300,000 observations. Demonstrating robust generalizability, the test model achieves an R² of 0.906 in the Gulf of Mexico, outperforming conventional deep learning approaches, including CNN-LSTM, BiLSTM, Temporal-Relational GNN, and AGTCNSD. Statistical error metrics confirm the GNN-T's superior predictive accuracy and lower error rates. Global sensitivity and uncertainty analysis (GSUA) highlights sea surface temperature, normalized fluorescence line height, and particulate organic carbon as key drivers. Convergent Cross-Mapping (CCM) elucidates nonlinear causal relationships, distinguishing correlation from mechanistic causality. Additionally, a causality-driven ablation study, guided by CCM and Sobol sensitivity analyses, streamlined the model by selecting the top 13 influential variables, achieving a test R² of 0.882 with 25% reduced computational costs, enhancing operational efficiency without substantial loss in predictive accuracy. Uncertainty quantification, performed using Monte Carlo dropout, provides 95% confidence intervals. Quartile analysis establishes bloom thresholds at the 50(th) (bloom), 75(th) (intense bloom), and 90(th) percentiles (extreme bloom) for probabilistic risk assessments. This model serves as an effective operational tool for detecting algal bloom onset and mitigating associated economic impacts.