Abstract
In the context of bearing anomaly detection, challenges such as imbalanced sample distribution and complex operational conditions present significant difficulties for data-driven deep learning models. These issues often result in overfitting and high false positive rates in complex real-world scenarios. This paper proposes a strategy that leverages multimodal fusion and Self-Adversarial Training (SAT) to construct and train a deep learning model. First, the one-dimensional bearing vibration time-series data are converted into Gramian Angular Difference Field (GADF) images, and multimodal feature fusion is performed with the original time-series data to capture richer spatiotemporal correlation features. Second, a composite data augmentation strategy combining time-domain and image-domain transformations is employed to effectively expand the anomaly samples, mitigating data scarcity and class imbalance. Finally, the SAT mechanism is introduced, where adversarial samples are generated within the fused feature space to compel the model to learn more generalized and robust feature representations, thereby significantly enhancing its performance in realistic and noisy environments. Experimental results demonstrate that the proposed method outperforms traditional baseline models across key metrics such as accuracy, precision, recall, and F1-score in abnormal bearing anomaly detection. It exhibits exceptional robustness against rail-specific interferences, offering a specialized solution strictly tailored for the unique, high-noise operational environments of intelligent railway maintenance.