Abstract
Debris flow is a type of non-homogeneous fluid with a high concentration that is created by melting snow and ice and heavy rainfall. Its formation and movement are intricate processes. The investigation of debris flow susceptibility assessment is crucial for disaster warning and mitigation. Since it is challenging to predict debris flows with precision using traditional methods, machine learning algorithms have been used more and more in this field in recent years. In this paper, a debris flow susceptibility assessment model is constructed based on RF (Random Forest) and XGBoost (Extreme Gradient Boosting) models with Stacking ensmble learning method, and SPY technique is introduced to optimize the negative sample selection. The outcomes demonstrate that the SPY-RF model with AUC value of 0.93 outperforms the original RF model, which had an AUC value of 0.82, by a significant margin, and performs well in all risk levels, particularly in the very high susceptibility zone with the highest debris flow density. Furthermore, the SPY-XGBoost model's AUC value of 0.87 is superior to the original XGBoost model's 0.72. This suggests that the SPY technique is able to improve the prediction accuracy and reliability of the model, especially effective in reducing the misclassification of non-prone areas. On the other hand, the high correlation of base-learner features prevented the Stacking-RF and Stacking-XGBoost models from improving the prediction performance any further, with AUC values of 0.80 and 0.71, respectively. The results of the factor contribution analysis indicate that the main factors influencing the susceptibility of debris flow are SPI, rainfall, curvature, and area. Of these, SPI contributes the most, indicating the critical role that water flow intensity plays in the formation of debris flow. This paper presents a study that demonstrates the benefits of integrating SPY technology with ensemble learning. Additionally, it investigates the shortcomings of the Stacking model in debris flow prediction, offering a valuable avenue for future research on optimizing model diversity and enhancing prediction performance.