Abstract
Bearing fault diagnosis has attracted increasing attention due to its critical role in monitoring the health of rotating machinery. Data-driven models based on deep learning (DL) have demonstrated strong capabilities in feature extraction. However, their performance often degrades under strong noise interference, which limits their applicability in real-world industrial scenarios. To address this issue, this paper proposes a novel attention-enhanced Transformer model that integrates large-kernel convolution and multiscale CNN structures for robust fault diagnosis. The proposed framework effectively combines spatiotemporal feature modeling with adaptive frequency-domain enhancement, enabling it to suppress noise and highlight informative diagnostic features. Experimental results on the Paderborn University and Case Western Reserve University datasets show that the proposed method achieves superior recognition accuracy under various signal-to-noise ratios, outperforming several state-of-the-art models. Furthermore, ablation studies and visualization analyses validate the effectiveness and soundness of the proposed architecture.