Abstract
Rare natural hazards-such as severe storms, flooding, and wildfires-are difficult to model due to scarcity of high-quality observational data. This scarcity results in an imbalance within the dataset, where high-impact events are severely underrepresented, reducing the effectiveness of machine learning (ML) models. In this study, we propose a framework using stable diffusion to generate physical consistent synthetic storm events for data enrichment. Our method generates paired input-output samples, ensuring that synthetic meteorological fields are meaningfully aligned with their corresponding impact. A variational autoencoder compresses 19-channel storm fields into latent space, where a cluster-conditioned diffusion model generates events aligned with outage severity. Synthetic events are filtered using evaluation metrics that assess distributional similarity and physical consistency. Enrichment with screened synthetic events significantly improved ML-based outage prediction modeling, which consequently boosted overall prediction accuracy-reducing CRMSE by 39%, increasing R(2) by 11%, and NSE by over 200%.