Abstract
The demand for uncertainty quantification in modern sequence modeling tasks has prompted researchers to explore deep integration between Bayesian inference and Transformer architectures, but existing methods still face systematic engineering challenges in key technical aspects such as attention mechanism probabilization, residual connection uncertainty propagation, and epistemic-aleatoric uncertainty decoupling. This study proposes the Residual Bayesian Attention (RBA) framework, which achieves end-to-end probabilistic inference capabilities through three tightly coupled core components: Bayesian feedforward layers establish differentiable propagation mechanisms for parameter-level uncertainty, multi-layer residual Bayesian attention embeds radial basis function kernels into attention computation and introduces adaptive residual weights modeled by Beta distributions, and the Bayesian covariance construction module generates mathematically rigorous covariance representations through outer product operations and eigenvalue correction. Systematic evaluation on benchmark datasets covering six domains including engineering optimization, time series forecasting, and spatial modeling demonstrates that RBA achieves stable uncertainty quantification performance in medium-scale structured data scenarios, particularly exhibiting technical advantages in prediction interval calibration quality. Notably, through objective evaluation of challenging tasks such as complex physical systems, this study identifies the common technical boundaries of current deep learning methods in multi-physics coupled system modeling, providing important empirical insights for the development direction of this field. Therefore, RBA, as a systematic engineering integration framework of Bayesian inference and Transformer architecture, provides a methodological contribution with clearly defined applicability boundaries for principled uncertainty quantification in deep sequence modeling.