Abstract
BACKGROUND: Influenza forecasting in Xinjiang is critical but hindered by unique dual challenges: complex, multi-modal routine seasonality and extreme post-COVID-19 structural breaks. This study develops a two-stage hybrid model that fuses local epidemiological features to provide accurate forecasts where general-purpose models may fail. METHODS: We utilize data from 2011 to 2023 provided by the Disease Prevention and Control Center of Xinjiang Production and Construction Corps, Urumqi, Xinjiang, China, stratifying it into pre-pandemic and COVID-19-affected periods to test model performance against routine complexity and major structural breaks. We propose a two-stage Long Short-Term Memory-Gradient Boosting Regression (LSTM-GBR) model that fuses an epidemic trend baseline, extracted from historical cases by an LSTM, with engineered virological and demographic features via a GBR. We systematically compare its performance across key metrics against a breadth of benchmarks, including classical statistical models, automated forecasting tools, and state-of-the-art pre-trained foundation models. RESULTS: The proposed LSTM-GBR framework demonstrates superior performance and yields the lowest error. It is the only model to accurately resolve the complex bimodal seasonality in the pre-pandemic test set. Furthermore, it is the only model to successfully anticipate and forecast the extreme 2023 post-COVID-19 rebound. All baseline models, including state-of-the-art foundation models, fail to predict these critical dynamics. CONCLUSION: The proposed LSTM-GBR model provides a validated, robust, high-accuracy forecasting tool specifically for Xinjiang. It successfully addresses the failures of general-purpose models, which are unable to capture the region's complex local dynamics and emergent structural breaks. This localized feature-fusion approach is essential to strengthen public health surveillance in Xinjiang.