Abstract
Data assimilation (DA) integrates observations with model forecasts to produce optimized atmospheric states, whose physical consistency is critical for stable weather forecasting and reliable climate research. Traditional Bayesian DA methods enforce these nonlinear, flow-dependent physical constraints through empirical and tunable covariance structures, but with limited accuracy and robustness. Here, we introduce latent DA (LDA), a framework that performs Bayesian DA in a latent space learned from multivariate global atmospheric data via an autoencoder. We demonstrate that the autoencoder can largely capture nonlinear physical relationships, enabling LDA to produce balanced analyses without explicitly modeling physical constraints. Assimilation in latent space also improves both analysis quality and forecast skill compared to traditional model-space DA, under both idealized and real observational settings. Furthermore, LDA exhibits strong robustness across latent dimensions and remains effective even when the autoencoder is trained on inaccurate but physically realistic forecasts, highlighting its flexibility for real-world applications.